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OVERVIEW 


Hi  is  report  presents  the  development  of  equations  and  curves  that- 
»  describe  the  capacity  characteristics  of  the  AN/UYK-20  and  its 

peripherals.  Hiese  will  form  the  basis  for  a  future  Capacity  Cal¬ 
culations  handbook  and  will  contribute  to  the  subsequent  capability 
of  making  capacity  assessments  under  conditions  of  actual  utilization 
•  of  the  AN/UYK-20  by  a  large  variety  of  both  systems  and  applications 

software. 

The  means  employed  to  assess  capacity  in  this  report  are  those  of 
?  Software  Physics.  This  discipline,  developed  in  Kolence's  An  Intro¬ 

duction  to  Software  Physics,  mathematically  derives  capacity  charac¬ 
teristics  of  computing  equipment  from  fundamental  equipment  descrip¬ 
tions  and  specifications.  However,  capacity  available  is  not  neces- 
,  sarily  power  actually  used  by  a  workload  (software)  executing  on 

the  equipment.  Software  Physics  theory  identifies  the  parameters 
which  govern  the  utilization  of  capacity  and  predicts  the  quantity 
of  power  actually  used.  This  is  essential  to  the  improvement  of 
.  performance.  Further  discussion  of  capacity,  power  and  performance, 

as  well  as  other  Software  Physics  terminology  can  be  found  in  the 
Glossary  which  follows  the  body  of  this  report. 

A  predictive  theory  must  be  tested;  and  so  the  companion  report, 
"AN/UYK-20  Capacity  Equipment  Specification"  proposes  experiments 
intended  to  verify  the  power  equations  and  curves  of  this  study. 

It  is  anticipated  that  a  future  report  will  contain  the  full  design 
I  of  such  experiments  and  that  the  comprehensive  handbook  of  AN/UYK-20 

Capacity  Characteristics  will  be  developed  concurrent  with  the 
experiment  performance. 


» 


» 
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SECTION  1 


INTRODUCTION 


1.0  GENERAL 

This  is  the  final  report  for  the  study  of  AN/UYK-20  configuration 
capacity  performed  for  the  Naval  Ocean  Systems  Center,  San  Diego, 
under  Contract  N6601-77-C-0252BW.  Included  in  this  report  are 
the  exposition  of  methodology  for,  and  presentation  of,  theoretical 
capacity  (power)  equations  and  curves  for  typical  AN/UYK-20  config¬ 
urations,  devices  and  processors. 

1.1  OBJECTIVES  AND  GOALS 

The  objectives  of  this  study  are  to  produce  theoretically  derived 
capacity  (power)  characteristics  for  the  AN/UYK-20  computer  CP 
and  peripherals.  These  will  form  the  basis  for  a  future  capacity 
calculation  handbook  and  will  contribute  to  the  subsequent  capa¬ 
bility  of  making  capacity  assessments  under  conditons  of  actual 
utilization  by  a  large  variety  of  both  systems  and  applications 
software  units.  As  a  consequence,  methods  of  predicting  performance 
(e.g. ,  throughput,  response  times,  etc.)  become  capable  of  realiza¬ 
tion.  Additionally,  were  characterization  techniques  thoroughly 
developed  for  AN/UYK-20  introduction  mixes  and  IOC  command  sequences, 
the  software  units  could  then  become  subject  to  control  of  perform¬ 
ance  and  capable  of  optimization  at  the  design  stage. 

It  is  a  particular  feature  of  this  study  that,  for  the  goal  of 
describing  actual  performance,  it  considers  the  effects  of  conten¬ 
tion  for  main  memory  by  the  CP,  IOC  and  DMA  facility  on  CP  capacity. 
Thus  CP  execution  power  will  be  described  as  functions  of  concurrent 
IOC  or  DMA  power  used. 
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1.2  SCOPE  AND  APPROACH 

The  following  sections  present  a  series  of  equipment  capacity 
equation  and  curve  developments  leading  to  the  higher  level  IOC/ 
channel  configuration  power  characteristics.  Each  development  of 
capacity  equations  and  curves  for  peripheral  devices  is  pre¬ 
ceded  by  a  brief  list  of  pertinent  device  specifications. 

More  complete  information  is  to  be  found  in  specifications 
provided  by  the  manufacturers . 

Three  peripheral  devices  were  selected  for  analysis.  These 
are  the  AN/USH-26  Cartridge  Magnetic  Tape  Unit  (CMTU) ,  the 
AN/USH-23  Disk  Controller/Storage  System  and  the  AN/USQ-69 
Keyboard/Display  unit.  For  the  first  two,  capacity  curves 
are  also  developed  for  multiple  units  operating  with  a  single 
controller;  that  is,  for  a  control  unit  configuration.  These 
curves  are  shown  tabulated  and  plotted  for  select  cases  of 
parameterization.  The  presentation  of  more  extensive  tabula¬ 
tion  will  be  the  function  of  an  anticipated  software  physics 
handbook  for  these  devices  and  configurations. 

The  subsequent  sections  of  this  report  present  a  development 
of  a  CP  power  methodology  for  the  AN/UYK-20 .  This  includes  a 
vector  formulation  of  CP  forces  with  components  determined 
by  the  various  containers  operated  on  and  by  the  nature  of 
the  action.  From  this  we  will  present  a  methodology  developing 
CP  power  for  classes  of  instructions  and  the  workloads  of 
which  they  are  constituents.  The  impact  of  IOC  or  DMA  conten¬ 
tion  with  the  CP  for  main  memory  is  then  analyzed  and  the 
effects  on  CP  execution  power  are  developed  for  some  specific 
instruction  classes  and  a  typical  instruction  mix. 

These  methods  and  results  should  be  considered  the  predecessors 
of  a  comprehensive  handbook  of  AN-UYK-20  capacity  methodology 
and  tabulations,  offering  the  potential  of  the  use  of  software 
physics  theory  and  results  in  effective  hardware  configuration 
design  and  in  the  design  and  implementation  of  system  or 
applications  software. 


1.3  REFERENCE  PUBLICATIONS 


This  report  assumes  familiarity  with  the  fundamentals  and  termin¬ 
ology  of  Software  Physics  and  some  familiarity  with  the  AN/UYK-20 
and  its  peripherals.  The  following  two  publications  are  offered 
as  containing  material  adequate  to  satisfy  these  prerequisites. 

(a)  Kolence,  Kenneth,  "An  Introduction  to  Software  Physios", 
Institute  for  Software  Engineering,  Inc.,  Palo  Alto,  1977. 

(b)  Sperry  Univac  Defense  Systems,  "AN/UYK-20  Technical 
Description" ,  Publication  number  PX10431C,  Sperry  Univac 
Corporation,  St.  Paul,  Minn. 

1 . 4  ACKNOWLEDGEMENT 

The  author  gratefully  acknowledges  the  contributions  to  the 
development  of  this  report  provided  by: 

William  J.  Dejka  (NOSC) 

Robert  B.  Holland  (System  Industries,  Inc.) 

Kenneth  W.  Kolence  (Institute  for  Software  Engineering) 

John  Westergren  (Sperry-Univac) 

William  Yonkers  (Qantex,  Inc.) 
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SECTION  2 


AN/USH-26  CMTU  POWER 


2 . 1  GENERAL 

The  AN/USH-26  Cartridge  Magnetic  Tape  Unit  (CMTU)  like  many  other 
tape  devices  operates  in  the  Software  Physics  Type  1  mode.  This 
means  that  the  control  unit  and  channel  are  in  execution  whenever 
an  individual  drive  is  performing  actions  consequent  on  the  issu¬ 
ance  of  a  read  or  write  command. 

Thus,  when  only  these  read/write  related  actions  are  considered, 
the  a  or  S  configuration  power  is: 


(a) 

Independent  of  the  number 

tion. 

of  drives  in  the  a  or  B  configura- 

(b) 

CMTU 

Identical  to  the 

average  blocksize 

CHARACTERISTICS 

power  curve  for  a  single  drive  using  the 

for  the  entire  a  or  S  configuration. 

(a) 

Tape  speed 

- 

30  ips 

(b) 

Start-up  time 

- 

30  ±  1  msec 

(c) 

Interrecord  gap  (IRG) 

1.2  -  1.8  inches.  (1.6  -  1.8  inches 

for  successive  read/write  operations) 

(d) 

Recording  density 

- 

1609  bits/inch  (serial) 

(e) 

Data  rates 

320  msec  between  16  bit  words 

(nominal) 

(f) 

Overhead  bytes 

16  bit  preamble 

16  bit  CRC 

16  bit  postamble 

(g) 

Record  lengths 

- 

2048  bytes  max.  recommended 

(h) 

Rewind  time 

- 

42  sec.  for  300  ft.  of  tape 

(i) 

Configuration (s) 

- 

1-4  drives/controller 
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2.3  DEVICE  AND  CONFIGURATION  STATE  CHARTS 


The  STANDARD  CYCLE  STATE  CHART,  Figure  2.1,  shows  the  drive/control 
unit/channel  busy  states  for  read/write  actions. 

The  FULL  STATE  CHART,  Figure  2.2,  shows  the  busy  states  for  all 
drive  activating  commands. 
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Drive 
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(IRG) 


Search 
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Action 

(Read/Write) 


Term 

(EOR) 


STANDARD  CYCLE  STATE  CHART  (Type  1) 
AN/USH-26  CMTU 
Figure  2 . 1 
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2.4  CMTU  DRIVE  POWER 

As  am  instance  of  type  1  (tape)  power,  we  have  that: 

P(tp,CMTU)  =  .  +  t  +  t  +  W  r  MBTR 

12  4 

where:  =  tap&  spggd  x  density  =  30  *  200  =  6  *  10°  bytes  /sea 

t i  =  I3?G  time  =  l(IRG)/tape  speed 

=  1.7/30  =  56.7  msec 

t  =  Record  preamble  time  =  #  preamble  bytes  *  MBTR 

u 

7 

=  2/6  * 10  =  ■  333  msec 

t  =  Record  CRC  +  Postamble  time 
4 

=  (#CRC  bytes  +  #postamble  bytes)  *  MBTR 

7 

=  (2+2)  t  6  x  10  =  0.667  msec. 


P( tp,  CMTU)  =  - s - r 

56.7  +  0.333  +0.667  +  V/6  x  20°  x  20~ 


57.  7  +  0.167  W 


amd  the  block  size  efficiency 


P (tape, CMTU, W.)  P (tape,  CMTU ,W .) 

‘ Z *  V 


'  asymptotic 


6  x  20- 


Equations  (2.1)  and  (2.2)  are  tabulated  and  plotted  in  Table  2.1 
and  Figure  2.3,  respectively  for  a  wide  range  of  data  blocksizes. 

As  a  consequence  of  the  fact  that  the  channel  and  control  unit  are 
busy  throughout  the  device  stamdard  cycle,  the  defining  character¬ 
istic  of  type  1  power,  and  thus  as  no  power-producing  overlap  is 
possible  between  multiple  drives  on  a  control  unit  or  channel, 
the  power  characteristics  of  the  a  (chamnel)  or  S  (control  unit) 
configurations  are  identical  to  that  of  a  single  drive. 
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BLOCKS I ZE 

(Bytes) 

EXECUTION  TIME 

(10  ^  sec) 

TAPE  POWER 

(KW/sec) 

% BLOCKS I  ZE 

EFFICIENCY 

80 

71.06 

1.126 

18.77 

120 

77.74 

1.544 

25.73 

250 

99.45 

2.514 

41.90 

500 

141.2 

3.541 

59.02 

1,000 

224.7 

4.450 

74.17 

2,000 

391.7 

5.105 

85.08 

3,000 

558.7 

5.370 

89.50 

4,000 

725.7 

5.512 

91.87 

5,000 

892.7 

5.600 

93.33 

6,000 

1060. 

5.662 

94.39 

10,000 

1728. 

5.788 

96.47 

20,000 

3398. 

5.886 

98.10 

100,000 

16760. 

5.967 

99.45 

1,000,000 

167057. 

5.986 

99.77 

CONTINUOUS  READ/WRITE  POWER  (Type  1) 
AN/USH-26  CARTRIDGE  MAGNETIC  TAPE  UNIT 

Table  2 . 1 
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ASYMPTOTIC  POWER 


CONTINUOUS  READ/WRITE  POWER  -  AN/USH  26  CARTRIDGE  MAGNETIC  TAPE  UNIT 
Figure  2.3  Channel/Control  Unit/Drive 


SECTION  3 


AN/USH-23  DISK  SYSTEM  POWER 


3.0  GENERAL 

The  AN/USH-23  disk  system  (currently  the  System  Industries  Model 
3500)  consists  of  a  control  unit  and  from  one  to  eight  drives 
with  either  a  fixed  or  removable  disk  platter. 

The  control  unit  and  drives  operate  in  the  Software  Physics  type 
2  mode;  that  is,  the  initial  positioning  seek  on  one  drive  may 
be  overlapped  with  actions  on  others.  Data  records  are  formatted 
into  fixed  length  sectors  on  the  disk  and  the  system  is  capable 
of  reading  or  writing  records  that  span  sector,  track  or  cylinder 
boundaries  as  effectively  one  operation. 

3.1  AN/USH-23  DISK  SYSTEM  CHARACTERISTICS 

3.1.1  System 

(a)  System  Capacity  -  19488  k  Bytes 

(b)  Up  to  8  drives  may  be  attached  to  a  single  controller. 
Drives  5-8  are  daisy-chained  from  1-4  and  share  controller 
registers. 

At  least  one  drive  in  a  daisy-chained  pair  must  be  a 
removable  cartridge  type. 

(c)  Addressing  by  disk  sector  (type  2) . 

(d)  Overlap  seek  permits  concurrent  seeking  on  up  to  8  drives. 
Subsequent  data  trams fer  may  occur  after  seek  on  one  drive 
while  others  are  still  seeking. 

(e)  Spanning  of  sectors,  tracks  and  cylinders  continues  without 
IOC  action  required  until  the  word  count  is  satisfied. 


Drives  (DIABLO  models  31/33F) 

(a) 

Tracks  per  surface 

- 

203 

(b) 

Tracks  per  cylinder 

- 

2 

(c) 

Words/sector 

- 

256  or  128 

(d) 

Sectors/track 

- 

12  or  24 

(e) 

Disk  capacity 

- 

2436  k  Bytes 

(f) 

Byte  transfer  rate  (MBTR) 

- 

195.2  k  Bytes/sec 

(g) 

Average  latency 

- 

20  ms 

(h) 

Head  movement  times: 

Cylinder  to  cylinder 

- 

15  ms 

Average 

- 

70  ms 

200  cylinders 

- 

135  ms 

(i) 

Sector  format: 

i)  First  preamble 

- 

20  bytes 

ii)  Sector  address  word 

- 

2  bytes 

iii)  Sector  status  word 

- 

2  bytes 

iv)  Second  preamble 

- 

20  usee 

v)  Data 

- 

512/256  bytes 

vi)  CRC 

- 

2  bytes 

(j) 

Interrecord  gap: 

12  sector  format 

- 

524  Usee 

24  sector  format 

- 

168  Usee 

(k) 

Maximum  transfer 

- 

128  k  Bytes 

3.2  AN/USH-23  DRIVE  POWER 

3.2.1  Single  Sector  Standard  Cycle 

Figure  3.1  presents  a  standard  cycle  for  the  input/output  of 
a  single  sector  on  the  series  30  disk  drive.  There  are  certain 
irregularities  for  this  drive  as  compared  to  the  ordinary  type 
2  drive  power.  These  are: 
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Seek 

1 

\ 

Search 

(1)  j  (2) 

| 

Action 

(r/w) 

t 

e 

r 

m 

initial 

action 

t10  j 

t20J t22 

;  ■■  : 

track  boundary 

0 

1 

0>t22\ 

i  ^  1 

1  t41*ti 

cyl  boundary 

1 

t 21**221 

1  1 

t41iti 

where : 

t10  = 

stand  alone  seek  time 

tll  * 

cyl  to  cyl  positioning 

time 

t^Q  =  average  rotational  delay 
t 22  “  sector  preamble  time 


'21 

*S 

'41 


-  rot  delay  after  cylinder 


-  20  ms 

-  0.14  ms 

-  25  ms 


to  cylinder  repositioning 
=■  read/write  action  time 

12  sector  format  -  2.611  ms 
24  sector  format  -  1.306  ms 
*  CRC  read/write  time  -  .01  ms 


Uote: 


t £2  ~  overhead  (interrecord)  time 

12  sector  -  .524  ms 

24  sector  -  .128  ms 

(a)  Stand  alone  seek  time,  for  an  overlap  seek  operation. 

(b)  Initial  action  search  time  ^£0^22^  ^nc^udes  an  unexecuted 
seek. 


MODEL  31/33F  TYPE  2  STANDARD  CYCLE 
(SINGLE  SECTOR) 

_ _____ _ _ _ 
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(a)  There  are  two  search  substates  due  to  the  sector  format 
and  the  fact  that  sector,  track  and  cylinder  spanning  is 
possible  for  input/output  of  a  single  logical  record. 

The  first  substate  represents  the  time  of  expected  latency 
after  an  initial  seek  (tgg)  or  of  rotational  delay  after 
cylinder  repositioning  T^e  secon<*  substate  repre¬ 

sents  the  time  to  detect  and  pass  over  the  first  record 
preamble  (t ^ ) •  Note  that  the  rotational  delay  after 
cylinder  to  cylinder  repositioning  is  a  fixed  value,  not 
a  statistical  average  as  for  initial  seek  expected  latency. 


We  have  therefore  that: 


i)  After  the  initial  seek: 

*2  =  search  time  = 

where  t^g  =  average  rotational  delay  =  20  ms. 
t22  =  sector  preamble  time  =  0.14  ms. 

ii)  After  a  cylinder  to  cylinder  repositioning: 

t2  =  t21  +  t22 

where  t  =  rotation  time  -  seek  time 

6i 

-  40  -  15  =  25  ms. 

*22  =  sector  preamble  time  =  0.14  ms. 

iii)  within  a  cylinder: 

*2  "  *22  "  °’14  m8’ 


(b)  There  are  two  termination  substates,  and  t  due  to 
-  41  42 

the  fact  that  interrecord  overhead  must  be  accounted  for 
in  all  but  the  last  sector  read  or  written  for  a  logical 
record. 


Thus  t.  =  termination  time  =  t +  t. , 

4  41  42 

for  all  but  the  last  sector 

t3  =  t for  the  last  sector. 

4  41 

3.2.2  Formulation  of  Tx(S3&) 

As  multiple  sectors,  tracks  and  cylinders  can  be  spanned  during 
a  single  input/output  operation  on  a  Model  31/33F  drive,  and  we 
will  need  to  formulate  an  expression  for  the  device  execution 
time  that  accounts  for  the  boundary  events  noted  in  Figure  3.1. 

We  let  N .  .  equal  the  number  of  occurrences  of  the  events  whose  time 

is  t.  i  «■  .g. ,  N  is  the  number  of  search  (1)  events. 

tj  oJ 

Let  b  =  bytes/blook  ( sector) 
s  =■  sectors /track 
!  =  tracks /cylinder 

Let  a  =  the  sector  offset  of  the  first  block  on  the  commencing 
track  (0-s  ) 

Let  T  =  the  track  offset  of  the  first  track  on  the  commencing 
cylinder  (0-c) 

Now  for  a  given  byte  count  W , 

V 

B  -  block  occupancy  =  (3.1) 

where  [a]  +  =  the  first  integer  >_  a 
d  S  mod  3+o, 

T  -  track  occupancy  =  [— ]  +  [ - — ]  ,  (3.2) 

8  S  T 

where  [a]  =  integer  portion  of  a 

T  T  mod  a  + 

C  =  cylinder  occupancy  ~  [^]  +  [ - — - ]  ^  (3.3) 
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So  we  have: 


N 

N 


11 

21 


=  C  - 
=  C  - 


1 

1 


N 

N 

N 

N 


22 

3 

41 

42 


=  B 
=  B 
-  B 
=  B  - 


1 


(first  cylinder  seek  time  is  £  ) 

(we  have  cylinder  to  cylinder  rotational  delay 
on  all  but  the  first  cylinder) 

(one  preamble  for  each  block) 

(data  portion  each  block) 

(CRC  termination  each  block) 

(overhead  IRG  for  all  blocks  but  the  last) 


So  for  a  logical  record  of  W  bytes: 

Tx(S,&)  =  t1Q  +  N11t11  +  t 20  +  N 21*21  +  N 22*22 


+  N3*3  +  B 41*41  +  N42*42 


Substituting  in  terms  of  blocks  and  cylinders  we  have : 

-  t  *  lC-Vttn*tn>  +  t  + 

(3.4) 

+  <B-1)*42 

3.2.3  Evaluation  of  Series  30  Drive  Power 

The  power  of  a  single  Series  30  drive,  denoted 
P (disk, 31/3 3F ),  is  given  by: 

P(diek,31/ssr>  =  ,3.5, 

where:  U  is  the  work  done  in  a  standard  cycle  (possibly  multi¬ 
sector) 

Tx(S,6)  is  the  execution  time  of  the  drive  for  the 
standard  cycle  as  given  by  Equation  (3.4). 
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Table  3.1  presents  P(di8k,31/33F)  tabulated  for  values  of  W  that 
are  equivalent  to  the  start,  midpoint  and  endpoint  of  sectors 
to  beyond  the  capacity  of  a  cylinder.  It  thus  shows  the  dis¬ 
continuous  power  drops  at  the  ends  of  sectors  and  at  the  end 
of  boundary.  The  asymptotic  power  is,  as  usual,  equivalent 
to  the  maximum  byte  transfer  rate  (MBTR),  which  for  this  drive 
is  195.2  KW/sec.  Figure  3.1a  is  a  plot  of  the  initial  part  of 
the  power  curve  showing  in  detail  the  sector  endpoint  discon¬ 
tinuities  which  give  the  function  a  sawtooth  shape.  The  sub¬ 
sequent  Figure  3.1b  additionally  shows  the  larger  discontin¬ 
uity  in  the  power  function  at  the  first  cylinder  boundary. 

In  both  figures,  curves  are  drawn  through  the  sector  maximum 
or  minimum  power  points  defining  smooth  power  envelopes  within 
the  domain  of  a  single  cylinder. 

3.2.4  Discussion 

The  discontinuities  in  the  Model  31/33F  power  function  indicate 
that  the  optimization  of  device  power  can  depend  on  the  sector 
and  track  offsets  of  the  commencing  record.  Thus  the  designer 
should  note  that  in  general  these  offsets  should  be  chosen  so 
as  to  minimize  the  number  of  cylinder  to  cylinder  repositions 
required  to  satisfy  the  request.  Record  sizes  that  utilize  only 
a  small  fraction  of  a  sector  are  inefficient  as  well.  It  should 
be  emphasized  that  the  time  intervals  caused  by  the  spanning  of 
sectors,  tracks  or  cylinders  are  not  available  for  other  actions 
(except  previously  initiated  seeks)  by  other  devices  on  the 
same  Input/Output  channel. 
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DIABLO  SERIES  30  TYPE  2  DRIVE  POWER  -  PART  II 
Tab 1 e  3.1b 
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2 


BLOCKS 


Figure  3.1a  DIA8L0  SERIES  30  TYPE  2  DRIVE  POWER  (1) 
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CYLINDER  | 


DIABLO  SERIES  30  TYPE  2  DRIVE  POWER  (2) 


3.3  AN/USH-23  CONTROL  UNIT  AN/UYK-20  CHANNEL  POWER 


We  will  develop  the  maximum  theoretical  power  equations  and  curves 
for  an  AN/UYK-20  IOC/channel  with  multiple  AN/USH-23  (Model  3500) 
controller  disk  8-configurations  each  with  1  to  8  Series  30  movable 
head  disk  drives. 


3.3.1  Channel/Controller/Device  States 

Although  the  model  3500  disk  subsystem  is  at  any  time  capable  of 
accepting  and  initiating  seeks  to  any  drive  not  active,  AN/UYK-20 
IOC  logic  considers  the  controller  unavailable  when  any  drive  on 
that  controller  is  engaged  in  any  of  the  following  actions: 

i)  search 

ii)  read/write  (tj 

iii)  termination  except  last  block  t  time. 

iv)  cylinder  to  cylinder  reposition  during  multi-block  read/write 
(tn). 

Figure  3.2  shows  the  channel,  control  unit  and  device  states 
during  the  first  composite  standard  cycle  (i.e.,  the  standard 
cycle  that  includes  sector  spanning) . 

The  channel  device  is  busy  whenever  the  controller  is,  so  the 
Software  Physics  Type  2  mode  characterizes  the  identical  control 
unit  and  channel  configuration  powers. 


Note  that  when  more  than  4  drives  are  attached  to  a  controller, 
daisy  chaining  is  implied  on  one  or  more  ports.  This  introduces 
the  possibility  of  increased  latency  time  when  positioning  to 
the  starting  sector  on  a  daisy-chained  drive,  if  it  is  not  the 
first  sector  occuring  on  the  track  (i.e.,  sector  offset  o n  p  Q)  . 
This  is  because  in  this  mode,  the  platter  index  marker  must  be 
sensed  prior  to  any  search  operation . 
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drives  in  execution 


Channel  State  Chart  -  Type  2,  Case  1  or  Case  2  Startup 


3.3.2  Power  Equation  Development 


As  a  notational  convenience,  let 


t,  =  the  composite  standard  cycle  search,  action  and 
*  termination  time 


"  +  *20  *  +  B,B-Vt42 

where  C  =  cylinder  occupancy 
B  =  block  occupancy 

for  a  typical  record  of  length  W 


To  obtain  the  maximum  theoretical  power  with  N  drives  active, 
assume  that  drive  orders  are  always  waiting  and  that  (stand  alone) 
seeks  are  issued  whenever  possible.  We  then  have  the  following 
two  cases: 


3. 3. 2.1  Case  1:  t w  <  (N-Vt^ 

In  this  case  all  steady  state  stand-alone  seeks  are  concurrent 
with  input/output  actions  on  the  other  drives.  (See  Figure  3.3). 
We  then  have  the  steady  state  power 

P(a, 3500, case  1)  -  ~  (3.6) 

since  in  a  standard  cycle  NW  bytes  are  transferred  in  time 
Note  that  this  power  is  as  for  a  single  drive  with 
initial  seek  time  t ^  *  0. 

3. 3. 2. 2  Case  2:  t.n  >  (N-l)t. 

10  —  $ 

In  this  case  (N-l)  composite  input/output  actions  can  be 
overlapped  with  the  stand-alone  seek.  (See  Figure  3.4). 

We  then  have  the  steady  state  power: 

mj 

P  fa,  3500,  case  2)  =  T~~  7  (3.7) 

t10  \ 
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of  drives  in  execution 


Number'  of  drives  in  execution 


Note  that  when  fc  .  =  (N-l)t 

1U  ijs 


P(a,31/33F,ease  2)  = 


(N-l)t  +  t, 
<P  <J> 


NW  W 

«  ~~  =  =  P( a,  3500, case  1) 


as  expected. 

3.3.3  a  or  6  Configuration  Power  Data  (AN/USH-23) 

Table  3.2  and  Figure  3.5  show  the  multiple  drive  powers 
obtained  from  Equations  (3.6)  and  (3.7)  for  an  average  byte 
count  W  =  2048.  Since  these  powers  are  derived  from  average 
logical  record  lengths,  uniform  seek  times  across  drives  and 
an  inexhaustible  queue  of  disk  orders,  these  values  are  to 
be  considered  as  the  average  theoretical  maximum  powers. 

We  have  for  this  logical  record  length: 

B  =  4,  C  =  1 

\  =  (C'1)(tll+t21)  +  *20  +  B(T22+tZ+t41}  +  (B-1)(t42} 

-  0  +  20  +  4(0.14+2.611+0.01)  +  3(0.524) 

—  Z2.62  msec. 

The  case  1  equation  becomes : 

t10  1  {N~2)t^  i-e.,  t  Q  <  (N-l) (32.62) 

P(a, 35 00, case  1)  =  ||^|j  =  62.79  kw/s 

And  for  case  2: 

i-e.,  t..  <  (N-l) (32.62) 


P(a,  3500, case  2)  = 


N  x  2048 
t1Q  +  32.62 


ABSOLUTE  a  or  0  CONFIGURATION  POWER  (max  theoretical) 

(KW/S) 


N  =  Number  of  Concurrently  Active  Spindles 

Figure  3.5  AN/USH-23  -  MODEL  3500  DISK  SUBSYSTEMS 

a  or  0  CONFIGURATION  POWER  (MAX) 
AVERAGE  BLOCKSIZE  =  2048  BYTES 
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3.3.4  a, 8  Configuration  Powers  as  a  Function  of  Seek  Time  (AN/USH-23) 


For  one  or  more  Model  3500  S  configurations  we  wish  to  show 
I  absolute  a  or  8  configuration  power  as  a  function  of  the  initial 

seek  time,  tjg  We  do  this  by  noting  that  for  any  number  of 
spindles  we  have : 

j  P (a,  3500)  *  P(ols  3500, oase  1)  =  constant 

when  t,n  <  (N-l)  t. 

and  we  determine  the  points  on  the  line 

P (a, 3500)  =  P( a, 3500, case  1) 

intercepted  by  the  functions 

—  —  NW 

P(a,3500)  =  P(a,  3500 }  case  2)  =  - - —r- 

t10  +  *<t> 

NW  — 

by  setting  — - =  P( a, 3500, case  1) 

tio  +  H 

These  functions  are  shown  plotted  as  a  function  of  the  seek  time 
in  Figure  3.6  for  a  value  of  W  -  2048.  The  value  of  P(a}3500) 
maintains  the  case  1  constant  value  (62.79  kw/s)  until  the  indi¬ 
cated  intercepts  for  each  N.  Note  that  for  N  >  5,  the  intercept 
value  of  tjg  exceed  the  device  maximum  of  135  msec,  so  that  the 
device  powers  remain  at  the  case  1  value  for  all  values  of  tjQ- 
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MODEL  3500«  or /J  CONFIGURATION  POWER  -  FUNCTION  OF  INITIAL  SEEK  TIME 

AT  AVFRAGF  Bl  OCKSI7F  ~  2048 


SECTION  4 


AN/USQ-69  (ADD  DISPLAY)  POWER 


4.1  GENERAL 

The  AN/USQ-69  Alphanumeric  Digital  Data  (ADD)  Display  is  a  keyboard/ 
CRT  device  capable  of  data  entry  to  and  data  display  from  the 
AN/UYK-20  computer.  Modes  of  operation  include  a  block  burst  mode 
for  input  or  display  and  an  input  only  character  mode.  It  is  the 
first  of  these  modes  which  will  provide  us  the  basis  for  a  richer 
analysis. 

4.2  AN/USQ-69  ADD  DEVICE  CHARACTERISTICS 

(a)  Input/Output  Modes: 

i)  Burst  mode  (Input  or  Display) : 

Block  transfers  to  or  from  internal  memory 

up  to  2000  characters  (standard) 
up  to  6000  characters  (optional) 

ii)  Character  mode  (Input  only) 

(b)  Display: 

i)  Capacity:  2000  characters  (25  lines  @  80  characters) 
ii)  Refresh:  Period  -  16.7  msec.  Time  -  1.36  msec. 

(c)  Interfaces: 
i)  Parallel 

MIL-STD  1397  (A) , (B)  or  (C)  parallel  channels  in  8  bit  mode, 
ii)  Serial 

MIL-STD- 188  or  EIA-STD-RS2 32C 

Serial  asynchronous  channels  @  2400  baud 

MIL-STD-188  Serial  synchronous  channels  @  9600  baud 

(d)  Configuration: 

i)  Each  ADD  input/display  includes  a  dedicated  controller, 
ii)  Up  to  8  displays  may  be  daisy-chained  on  one  asynchronous 
serial  channel . 
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4.3  ADD  Input  (keyboard)  Power 

4.3.1  Burst  Mode 

For  any  of  the  specified  interfaces,  transfer  times  from  the 
ADD  device  buffer  to  the  AN/UYK-20  are  small  compared  to  execution 
times  at  the  ADD  device  and  controller  level;  that  is,  the  time 
required  to  key  a  block  message.  This  latter  quantity  is,  of 
course,  extremely  variable  and  will  not  be  assessed  here. 

4.3.2  Character  Mode 

The  slowest  interface  constrains  the  character  transmission 
rate  to  2400/8  =  300  characters  per  second  so,  here  too,  it  is 
operator  key-in  rates  and  functions  that  effectively  determine 
device  power. 

4.3.3  ADD  Output  (Display)  Power 

(a)  The  output  state  chart  of  Figure  4.1  shows  a  single  trans- 
mit-display  cycle: 


Channel 


-*1  |  *-Tx(S,<x) 


Tx(S,a) 


1 

trans¬ 

1 

mit 

1 

1 

ADD 

Controller 


Tx(S,bH 


write  i 
mem.  ■ 


+Tx(S,$) 


ADD 

Display 


display) 


+Tx(S,6) 


Settling  Scan 


AN/USQ-69  DISPLAY  STATE  CHART 
_ Figure  4^1 


♦ 
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During  a  transmit-display  cycle,  the  controller  becomes  busy  for 
the  time  that  data  is  transmitted  to  ADD  memory.  The  display 
execution  time,  Tx(S, 5 j ,  is  given  as  the  time  required  for  full 
settling  of  the  display  from  start  of  memory  rewrite.  This  is 
estimated  to  be  the  memory  rewrite  time  plus  1.36  msec,  this 
latter  quantity  being  the  time  required  to  scan  a  single  CRT 
frame.  Subsequent  screen  refreshes  cure  not  considered  as  part 
of  device  execution. 

We  thus  have  the  following  equation  for  the  theoretical  maximum 
ADD  display  output  power: 

P(ADD  display)  =  —  — Jg  ™/S  (4.1) 

a 

where  W  is  the  average  block  length  transmitted 

and  MBTE  is  the  maximum  byte  transfer  rate  of  the  channel- 

a 

interface  device. 

4. 3.3.1  Parallel  Channels  (8  bit  mode) 

(a)  For  the  MIL-STD-1397  type  A  interface: 

P(ADD  Display  [1397  (A)])  =  y  6~  +  1  36  AV/5  (4>2) 

Table  4.1  and  the  graph  of  Figure  4.2  show  ADD  device 
power  for  block  lengths  of  up  to  a  full  screen  (2000  bytes.) 
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Block  Length 
(Bytes) 

Execution  Time 
(10-^sec) 

ADD  Display  Power 
KW/S 

%  Block 

Length 

Efficiency 

80 

3.28 

24.4 

58.7 

160 

5.21 

30.7 

73.8 

240 

7.13 

33.7 

81.0 

400 

11.0 

36.4 

87.5 

800 

20.6 

38.9 

93.5 

1200 

30.2 

39.7 

95.4 

1600 

39.8 

40.2 

96.6 

2000 

49.4 

40.5 

97.4 

AN/USQ-69  DISPLAY  POWER 

MIL-STD-1397  (A) 

INTERFACE 

Table  4. 

1 

(b)  For  the 

MIL-STD-1397  B  and 

C  interfaces: 

P(ADD  display) [1397 (B,C)])  = 

W  t  190  +  1.36  m/sea 

(4.3) 

This  equation 

;  is  tabulated  in  Table  4.2  and  plotted 

in  Figure 

4.3. 

%  Block 

Block  Length 

Execution  Time 
-3 

ADD  Display  Power 

Length 

(Bytes ) 

(10  sec) 

KW/S 

Efficiency 

80 

1.78 

44.9 

23.6 

120 

1.99 

60.3 

31.7 

240 

2.62 

91.5 

48.2 

400 

3.47 

115.4 

60.7 

800 

5.57 

143.6 

75.6 

1200 

7.68 

156.3 

82.3 

1600 

9.78 

163.6 

86.1 

2000 

11.9 

168.3 

88.6 

AN/USQ-69  DISPLAY  POWER 

MIL-STD-1397  (B) , 

(C)  INTERFACES 

Table  4.2 


P(ADD  Display  [1397(A)])  -  KW/S 


AN/USQ-69  ADD  DISPLAY  POWER 
MIL-STD-1397  TYPE  A  INTERFACE 


Figure  4.2 
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AN/USQ-69  ADD  DISPLAY  POWER 

Figure  4.3  MIL  STD-1397  TYPES  B  &  C  INTERFACES 


(c)  Serial  Channels  (8  bit  mode) 

For  the  EIA-STD-RS232C  &  MIL-STD  188C 

Serial  Interfaces  in  asynchronous  mode  at  their  maximum 
rate  (2400  bits/sec) 

P(ADD  display  [RS232sasynch] )  = 

W 

77-T — -r— = - — -0.3  kw/sec  (for  range  80-2000  bytes) 

Yi  T  O'.  (5  T*  1.0/ 

and  in  synchronous  mode  at  their  maximum  rate  of  9600  baud: 


P( ADD  display  [RS232C, synch]) 

Frrrro?' 1 -2 


(for  range  80-2000  bytes) 


4.3.4  a,S  Configuration  Powers  (AN/USQ-69  ADD  DISPLAY) 

4. 3. 4.1  ADD  Control  Unit  Power 

Since  the  ADD  control  unit  is  a  device  dedicated  to  a  single 
ADD  display  it  thus  exhibits  power  characteristics  identical 
to  the  ADD  display  itself. 


4. 3. 4. 2  ADD  Channel  Configuration  Power 

Here  we  have  a  single  special  case  to  be  considered;  that  is, 
the  daisy-chaining  of  two  to  eight  ADD  displays  from  a  single 
asynchronous  channel.  We  examine  a  power  for  a  single  set  of 
transmissions  to  each  device.  The  state  chart  of  Figure  4.4 
shows  the  settling  scan  of  the  first  N-l  devices  coincide  with 
data  transmission  to  subsequent  devices.  So,  for  N  ADD 
devices  daisy-chained  on  an  asynchronous  serial  channel 
(MIL-STD-188/EIA-STD-RS232C) ,  and  with  each  unit  to  receive 
a  message  of  average  length  W  characters,  we  have  for  maximum 
theoretical  channel  configuration  power: 


P(a,ADD  display  [188/RS232C] ) 


NW _ 

NW  t  MBTR  +1.36 


_ NW _ 

(NW+300)  x  J0~3  +  1.36 


(4.4) 
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AN/USQ-69  CHANNEL  STATE  CHART 
Figure  4.4 


Note  that  for  blocksizes  >_  80  (one  line)  we  have  that 
(NV  iZOO)  x  10 ^  »  1.36  for  all  N.  We  may  thus  conclude 
that  in  block  mode,  it  is  this  channel  rate  (300  characters  per 
second)  that  determines  the  maximum  power  (0.300  kw/s.) 


* 
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SECTION  5 

AN/UYK-20  CHANNEL  DEVICE  AND  CONFIGURATION  POWERS 


5.1  GENERAL 

The  AN/UYK-20  performs  input/output  activity  through  an  incorpor¬ 
ated  input/output  controller  (IOC)  which  operates  substantially 
independent  of  the  CP. 

Each  IOC-peripheral  parallel  mode  interface  consists  of  an  output 
channel  to  transmit  data  and  control  functions  to  the  peripheral 
device.  Input  channels  are  used  to  receive  data  or  interrupt 
codes  from  the  external  device.  All  parallel  mode  input/output 
activity  is  asynchronous,  with  the  timing  (and  hence  power) 
dependent  on  the  speed  of  the  peripheral  device. 

Serial  I/O  channels  are  also  available  for  communications  circuits 
which  operate  in  either  synchronous  or  asynchronous  modes.  The 
IOC  performs  all  necessary  serial-to-word  and  word-to-serial 
conversions. 

5.2  CHANNEL  DEVICE  CHARACTERISTICS  AND  POWERS 
5.2.1  Parallel  I/O  Channels 

These  are  supplied  in  groups  of  4  input  and  4  output  channels 
operating  in  the  full  duplex  mode,  permitting  concurrent  input 
and  output.  Furthermore ,  these  channels  may  be  operated  in 
byte  (8  bit),  single  (16  bit),  or  dual  (32  bit)  modes,  the  last 
requiring  the  use  of  a  channel  n  and  n  +  4,  i.e.,  the  use  of  2 
groups.  Maximum  transfer  rates  and  powers  are  given  in  Table 
5.1  for  the  16  and  32  bit  word  modes  for  input  or  output. 

Rates  and  powers  are  also  given  for  concurrent  input/output 
computed  as  1.75  times  the  unidirectional  rate*  subject  to  a 
maximum  of  1,000,000  16-bit  words/sec;  i.e.,  2000  kw/s. 


*  ref.  SPERRY-UNIVAC  PX  11772 


Interface  &  NUMBER  OF  CHANNELS  ACTIVE: 

•Voltage  (mode)  1_4  5-8  9-12  13-16 


5.2.2  Serial  I/O  Channels 


These  are  provided  in  2  channel  groups.  Serial- to-word  and 
word-to-serial  conversions  are  performed  by  the  AN/UYK-20  IOC. 

Maximum  rates  and  powers  for  the  various  interfaces  are  as 
follows: 

(a)  NTDS  SERIAL  CHANNEL: 

125,000  32  bit  words/sec  equivalent  to  power  500  kw/sec. 

(b)  EIA-STD-RS232C  and  MIL-STD-188C 
SERIAL  CHANNELS: 

Asynchronous:  2400,  1200,  600,  300,  150  or  75  bits/sec 
equivalent  to  powers 
300,  150,  75,  37.5,  18.75  works/sec. 

Synchronous:  Up  to  9600  baud;  i.e. ,  equivalent  to 

1200  works/sec. 

5.3  CHANNEL  CONFIGURATION  POWER 
5.3.1  Discussion 

Maximum  theoretical  channel  configuration  powers  depend  on  the 
kinds  of  devices  attached  and  for  this  reason  these  powers  were 
developed  along  with  the  powers  for  the  devices  and  their  con¬ 
trollers. 

In  general,  however,  note  that: 

(a)  Type  1  (e.g.,  tape)  devices  produce  channel  powers  equiva¬ 
lent  to  those  of  a  single  drive. 


(b)  Type  2  (e.g. ,  disk)  drives  produce  channel  powers  which 
depend  on  the  number  of  drives  concurrently  active  up  to 
a  limit  value  beyond  which  the  addition  of  drives  adds 
no  more  to  the  maximum  theoretical  power. 

(c)  IOC  configuration  (that  is,  the  total  input/output  config¬ 
uration  power)  is  obtained  by  simple  algebraic  addition  of 
the  individual  channel  powers  developed  in  those  sections 
pertaining  to  attached  devices.  This  sum  is,  however, 
limited  to  the  constraint  that  total  IOC  power  cannot  exceed 
2000  KW/ sec. 

Finally,  it  must  again  be  noted  that  the  computed  IOC  configu¬ 
ration  power  being  the  sum  of  maximum  theoretical  powers  is 
itself  a  theoretical  maximum  and  will  thus  not  be  achieved  in 
practice  except  instantaneously. 
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SECTION  6 


AN/UYK-20  CP  POWER 


6.1  INTRODUCTION 

In  this  section,  we  will  develop  a  methodology  and  notation  for 
expressing  AN/UYK  CP  power  in  terms  of  the  power  of  individual 
instructions  or  classes  of  instructions.  Work  performed  for 
instruction  setup  will  be  considered  as  well  as  work  done  on 
operands  (data) . 

The  final  subsections  will  consider  the  effects  of  concurrent 
IOC  and  DMA  facility  operation  on  CP  power;  that  is,  we  will 
express  CP  capacity  when  executing  certain  instructions  and 
instruction  classes  in  terms  of  concurrent  IOC  or  DMA  power. 

The  resulting  equations  for  I/O  activity  degraded  CP  power  will 
be  shown  tabulated  and  graphed  for  an  instruction  mix  con; idered 
typical  by  the  manufacturer. 

6.2  AN/UYK- 20  CP  ARCHITECTURE  AND  CHARACTERISTICS 

The  AN/UYK-20  CP  is,  in  actuality,  emulated  by  a  microprogrammed 
controller  (MPC) ,  a  set  of  registers,  and  a  two-bus  data  exchange 
structure.  Thus  the  execution  of  AN/UYK-20  instructions  res  .its 
in  the  execution  of  microprogrammed  code  with  data  and  control 
bits  shuttled  to  and  from  the  data  and  program  registers  and 
main  memory  via  the  source  and  destination  buses. 

Some  pertinent  CP  and  memory  access  characteristics  cure: 

(a)  Instruction  formats  -  lengths 

KR  (Register/Register)  -  16  bit 
RI  (Register/Indirect  Memory)  -  16  bit 
RK  (Register/Literal  Constant)  -  32  bit 
RX  (Register/Indexed  Address)  -  32  bit 
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(b)  16  general  purpose  registers  provided  @  16  bits. 

(c)  MPC  cycle  time  -  155  ±  5  nsec. 

(d)  Direct  addressing  to  65K  words. 

(d)  Cascaded  indirect  addressing: 

Each  level  of  cascade  requires  double  word  fetch. 

(f)  Overlapped  fetch  on  certain  instructions. 

(g)  Memory  access  cycle  -  without  DMA  -  750  ±  10  nsec. 

with  DMA  -  790  ±  10  nsec.  max. 

(h)  DMA  access  through  additional  ports  on  each  32K  work  bank 
of  memory. 

(i)  CP  has  priority  over  DMA  for  main  memory  access. 

(j)  IOC  has  priority  over  CP  for  main  memory  access. 

6.3  AN/UYK-20  CP  POWER  DERIVATION 

6.3.1  CP  Software  Work  -  Force  Vectors 

For  our  purposes,  CP  work  for  a  given  workload,!,,  may  be 
categorized  in  terms  of  the  domains  and  ranges  of  action  by: 

(a)  W(L, y)  -  CP /memory  work 

M 

(1  unit  for  every  byte  transferred  between  the  cpu  [or 
more  specifically  a  cpu  register]  and  memory.) 

(b)  W(L,y)r  -  register/register  work 

(1  unit  for  every  byte  transferred  between  cpu  registers.) 

Another  distinction  of  importance  is  that  between  the  kind  of 
CP  work  done  when  the  CP  is  in  states  performing: 

(a)  Setup/termination  work  -  effectively,  the  work  of  instruc¬ 
tion  fetch. 

Denoted:  Wj(L,y)^,  Wj(L,y)^ 


6-2 


(b)  Control  function  work  -  setting  of  control  registers  that 
can  be  tested  by  the  running  code.  This  is  a  type  of 
register/register  work  denoted: 

wc(l,y)r 

(c)  Data  transfer  work  -  operand  (data)  fetches  and  stores, 
operand  actions  as  per  instruction  definition: 

Denoted:  W^L, Y^.  ^D(L,y)R. 

We  will  be  concerned,  at  this  time,  mainly  with  CP /memory  work, 
as  register  to  register  work  can  be  thought  of  as  internal  work 
and  generally  represents  a  constant  fraction  of  the  CP/memory 
work.  For  simplicity  we  will  denote  CP /memory  work  by  W(L, y), 
or  by  Wr(L, y)  or  W^(L}y)  when  the  instruction  or  data  states 
are  to  be  distinguished.  Note  that  because  of  the  extensive 
property  of  software  work: 

W(L,  y)  =  Wt(L, y)  +  Wrfl, y) 

6. 3. 1.1  Software  Containers 

The  CP  instructions  and  operands  are  represented  and  manipu¬ 
lated  in  portions  of  storage  or  registers  called  containers. 
For  example,  an  instruction  that  alters  a  16-bit  word 
as  data  is  said  to  perform  work  on  that  container.  The 
quantity  of  work  performed  on  the  container  is  2  works,  con¬ 
sistent  with  the  software  physics  definition  of  1  work  for 
each  8  bit  byte  with  changed  symbol  state. 

Tables  6.1a  and  6.1b  show  an  assignment  of  codes  to  the  vari¬ 
ous  AN/UYK-20  containers.  Note  that  these  are  grouped  by 
classes  derived  from  container  functions  and  location.  The 
codes  have  been  formulated  so  that  the  last  digit  is  the 
container  length  in  bytes  (8  bit  units) .  Digits  ft  the  right 
of  the  decimal  are  read  as  eighths  of  a  byte,  that  is,  bits. 


These  container  codes  will  provide  a  notational  convenience 
when  we  speak  of  instructions  which  map  operands  from  a 
domain  to  a  range,  each  being  defined  by  a  container  type. 
Note  that  instructions  fetches  as  well  are  interpreted  as 
work  done  on  a  type  of  container. 
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CLASS 

CONTAINER 

CODE 

NOTE: 

Container 

Storaae 

Bit 

0.1 

length 

indicated 

Literal 

0.4 

by  final  digit 

Byte 

1 

value. 

Fractions 

are  eighths  of  a 

Single  Word 

12 

byte  (i 

.e.  bits) 

Double  Word 

14 

Float  Double 

24 

Triple  Word 

16 

Interrupt  Area 

38 

IOC  Command  Cells 

42 

IOC  External 

49 

Interrupt  Area 

Data  Register  General 

102 

General 

-  odd 

112 

General 

-  even 

122 

General 

-  pair 

134 

even 

-odd 

AN/UYK-20  DATA  CONTAINERS  &  CODES 
PART  I 

Table  6.1a 
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CLASS 

CONTAINER 

CUUE 

Control 

Program  Addr 

202 

Reqister 

Status  1: 

318G 

DMA 

310.1 

Interrupt  I, II, III 

311.1 

FP  Round 

312.1 

FP  Interrupt 

313.1 

Condition  Code 

314.1 

Overflow 

315.1 

Carry 

316.1 

NDRO 

317.1 

Stack 

318.1 

Status  2: 

32  8G 

Interrupt  Code 

321 

Indirect  Control 

320.2 

Memory  Address 

902 

Instruction 

Instruction 

500G 

Reqister 

m-field 

501.4 

a-field 

502.4 

y-field 

512 

AN/UYK-20  DATA  CONTAINERS  S  CODES 
PART  II 


NOTE :  Container 
length  indicated 
by  final  digit 
value.  "G"  suffix 
indicates  group. 
Fractions  are 
eighths  of  a  byte 
(i.e.  bits) 


Table  6.1b 


6. 3. 1.2  CP  Software  Force  Vector  Diagrams 

The  vector  nature  of  CP  software  work  can  be  illustrated  for 
individual  CP  instructions  or  potentially  for  sequences  of 
instructions  by  a  graphical  device  which  we  will  call  a  Soft¬ 
ware  Force  Vector  Diagram.  The  basic  form  for  these  diagrams 
is  shown  in  Figure  6.2.  The  container  types  listed  on  the 
left  (or  bottom)  of  the  diagram  are  for  the  domain  of  the 
mapping  action  of  an  instruction,  while  those  on  the  right 
(or  top)  are  for  the  range  of  the  mapping.  Directed  line 
segments  of  types  to  be  listed  below  are  drawn  from  domain 
containers  to  range  containers.  These  indicate  a  directed 
software  force  acting  on  the  domain  containers.  The  couplet 
21  comPose<^  of  the  domain  and  range  container  codes  in¬ 
dicate  the  direction  of  the  force  denoted  by: 


J  t,  (ciac2) 

where  t  =  I,  D,  C  depending  on  the  type  of  work  performed: 

I  -  Instruction  work  (fetches,  indirect  addressing) 

D  -  Data  transfer  work  (operands) 

C  -  Control  function  work  (program-accessed  control  registers) 


The  software  force  of  an  instruction  can  thus  be  represented 
by  the  vector: 


t  3 


where  t  varies  over  work  types  and  a 


a  .  vary  over  all  con- 

,7 


tainer  codes. 


Work  is  done  when  the  Software  Force  acts  through  a  distance 

%.( a., a.)  whose  magnitude  is  the  length  of  a.  in  bits  *  8, 

^  J  J 


denoted  h .. 

3 


For  CP /memory  work,  we  have  the  scalar  quantity, 


work,  defined  by: 
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Table  6.2  summarizes  the  types  of  W(L, y)  corresponding  to 
the  states  J,  D  and  C,  their  notation  and  graphic  symbols. 
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CP  SOFTWARE  FORCES  -  BASIC  DIAGRAM 
Figure  6.2 
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Table  6.2 
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The  series  of  Figures  6.3  show  examples  of  completed  CP 
Software  Work  Vector  diagrams  with  directed  line  segments 
indicated  as  follows: 

(a)  cpu/memory  work  -  instruction  (fetch) 

(dashed  line) 

(b)  cpu/memory  work  -  instruction  (indirect  address 
generation  -  one  doubleword  [4  bytes]  fetched  for 
each  level) 

(starred  line) 

(c)  cpu/memory  work  -  data  transfer 

(solid  line) 

(d)  register/register  work  -  instruction  and  data  transfer 

(alternating  dot/dash  line) 

(e)  register/register  work  -  control  function 

(dotted  line) 

6.3.2  Instruction  Class  CP  Power 

6. 3. 2.1  Decomposition  of  Power  by  Class 

We  can  partition  the  AN/UYK-20  instruction  repertoire  into 

disjoint  classes  by  considering  sets  of  instructions  of  like 

format  or  like  function  or  by  another  characteristic  useful 

for  a  specific  purpose.  The  occurrence  of  only  one  instruction 

th 

per  class  is  the  degenerate  case.  Let  J.  denote  the  i— 

v 

instruction  class. 

Let  L  represent  some  workload  in  which  instructions  s  e  (J  ^ 
are  executed.  Let  S .  be  the  subworkload,  consisting  of  all 

the  executions  of  s  £  J.. 

v 
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Then : 


P(L,  y) 


W(SJ3  y)  +  W(S2,  y)  +  ••• 
Tx(L,y) 


W(Sr  y)  W(S2,y) 
Tx(L,y)  +  Tx(L,y)  + 


(6.1) 


We  now  use  a  representative  instruction,  s.,  chosen  or 
imagined  so  that: 


n.  •  W(s.,y)  =  WfS.j y) 
z  z  z 

and 

\  *  Tx(s^3 y)=  TxfS^y) 


(6.2) 


where  n .  is  the  number  of  instruction  executions  of  s  z  J . 

z  z 

in  L. 


W(SV  y) 

For  each  term  in 


(6.1) 


write 


W(S.,y) 

Tx(Sv  y) 

Tx(Lty) 

Tx(L,  y) 

W(S.,y) 

z 

Tx(Si}y) 

Tx(L,y) 

Tx(L,  y) 

v(S.,y) 

Is 

Tx(S.,y) 

z 

P(si3y) 


(6.3) 


where  P(S.,y)  denotes  the  absolute  power  of  the  subworkload,  S 

Is 


Noting  that  from  the  defining  relations,  (6.2) 


I 


t 


R 


and  since, 

Tx(Lsy)  =  £  n.Tx(s.,y) 
i 

we  have  from  (3) : 


WfSp  y)  Tx(S.t  y) 

Tx(L, y)  =  Tx(L,y)  '  P(Si’ &) 


n.Tx(s  .,y) 
x  x 

7  n  .Tx(s  .,y) 
H  x  x* 
x 


P(s  .y) 


and  so 


n  .Tx(s  .y) 

P(l,  y)  =  l  — - 1 


-  •  P(s.,y) 
l  n.Tx(s.,y) 


•  t 


l  [n  .Tx(s  .y)‘  P(s  .3y)] 

L-  U  u _ 

l  n.Tx(s.,y) 


(6.5) 


We  have  thus  derived  the  power  of  the  cpu  in  execution  on  the 
workload  L  in  terms  of  class  representative  instruction  counts, 
times  and  absolute  powers. 

6. 3. 2. 2  Choice  of  Instruction  Classes 

What  bears  further  investigation  are  the  way  of  partitioning 
the  an/UYK-20  instruction  repertoire  into  classes  which  will 
be  useful  in  guiding  the  software  design  process. 

Among  the  possibilities  for  defining  the  classes,  ^ ,  are: 

(a)  s  z  J.  iff  P(s,y)  =  P.  ±  e. 

X  XX 

i.e.,  instructions  of  approximately  like  powers. 


i 
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(b)  By  instruction  format  RI,  RX,  RK,  RL  or  more  generally: 


(c)  Supposing  that  we  have  chosen  an  index  set  of  source 

containers,  {c^.}  and  an  index  set  of  target  containers, 

{<?„.}  where  the  a.,  are  container  codes,  and  that  we 

^2  1-J 

represent  the  components  of  software  force  in  the  direc¬ 
tion  of  c .  .  to  c„  .  by  /,  ,  ,  as  previously  defined. 

22  ti{alji°22) 

Then  we  define 

set/  iff  f  ,  ^  o  for  some  i,j 

(°123  2f 1 

That  is  if  the  instruction  s  maps  a  container  listed  in 
{<?..}  to  one  listed  in  {c9  .}  it  belongs  to  the  class  *4. 

This  type  of  partitioning  would  prove  useful  in  choosing 
instructions  for  specific  types  of  arithmetic  or  logical 
functions . 

6. 3. 2. 3  The  Definition  of  Tx(s, y) 

For  an  individual  instruction  s  e  J.,  the  time  of  instruction 

Is 

Tx(s,y)  is  the  sum  of  the  times: 


i)  Tj.(s,  y)  - 


The  instruction  setup/ termination  time,  here 
effectively  the  instruction  fetch  time.  This 
is  a  function  of  instruction  length.  Fetches 
may  be  overlapped  with  execution.  In  general, 
however,  the  fetch  times  by  format  are: 


RI  Type  2}  1  memory  cycle 


}  2  memory  cycles 


(where  an  AN/UYK-20  memory  access  cycle 
requires  750  ±  10  nanoseconds.) 
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ii)  Tp(s,y)  -  The  time  period  commencing  with  the  instruc¬ 
tion  access  in  the  CP  register  by  the  macro¬ 
instruction-emulating  microprogrammed  con¬ 
troller  (MPC)  to  the  next  instruction  fetch. 
It  includes  operand  fetches ,  if  they  are 
required . 


When  indirect  addressing  is  in  effect,  the  time  for  additional 
accesses  should  be  added  to  the  time  T^(s,y). 

The  instruction  execution  times  quoted  in  the  AN/UYK-20 
Technical  Description  -  SPERRY-UNIVAC  PX  10431C  are  based  on 
actual  execution  and  are  composed  of  T^(s}y)  for  the  instruc¬ 
tion  plus  Tj(s,y)  for  the  following  instruction  in  the  sequence. 
We  will  assume  that  these  times  represent  a  fair  value  of 
Tx(sy y)  for  all  instructions. 

As  an  example,  consider  the 
(RI)  02  LOAD  DOUBLE 
instruction. 

Wj(8,y)  (instruction  fetch)  =  2  W 

Wp(s,y)  (data  transfer)  =  4  W 

(no  indirect  addressing) 

W(s,y)  (total)  =  6  W 

Tx(Sj y)  =  2.25  msec. 

So  absolute  instruction  power: 

P(s,y)  =  (6/2/25)  x  10~6  =  2.67  KW/S 


6.4  AN/UYK-20  CP  POWER  -  IOC  ACTIVITY  INTERACTION 


6.4.1  Introduction 

We  will  extend  the  previously  derived  instruction  class  power 
equation : 

l  [ n  .Tx(s  .,y)  •  (PCs y)] 

P(L,y)  =  - - - - - - -  (6.5) 

In/TxCs^y) 

to  include  the  effects  of  delays  in  CP  execution  due  to  memory 
access  demands  from  IOC  input/output  activity. 


It  will  be  convenient  to  rewrite  £6.5)  in  terms  of  instruction 
class  work  thus:  j 

i[niTx(si’'<)’TX(s.,y)] 

P(L,y)  =  - - -  (6.6) 


)n  .Tx(s  .,y) 
L  x  x 


This  is  done  as  the  contention  for  memory  access  will  affect  all 
the  terms  Tx(s.,y)  by  replacing  them  with  the  execution  time  of 
a  higher  level  processor,  T,  which  is  execution  whenever  either 
the  CP  or  IOC  are. 


The  resultant  relative  power  will  be  referred  to  as  IOC-Degraded 
Workload  Power  and  will  be  denoted  P* (L,y ) . 

We  shall  see  that  the  degradation  will  depend  on  the  instruction 
formats  in  execution  and  on  the  composition  of  IOC  power  by 
memory  access  bandwidth. 

We  will  first  develop  the  execution  time  of  the  processor  T  for 

the  duration  of  the  instruction  s.,  Tx(s T),  as  a  function  of 

x  x 

the  CP  instruction  execution  time  Tx(s  y^and  the  concurrent 

x 

IOC  input  output  power  P( <p). 


6.4.2  MPC  Emulation  of  CP  and  IOC  activity 

AN/UYK-20  execution  is  driven  by  a  microprogrammed  controller 
(MPC)  and  master  clock  running  at  155  ±  5  nsec  (denoted  t )  per 
clock  cycle.  This  is  the  processor  T.  The  micro  instruction 
code  emulates  the  CP  program  macro  instructions  and  services  IOC 
memory  access  requests.  The  following  model  will  be  used  to 
describe  the  augmentation  of  (macro)  instruction  execution  time, 

Txfs.jy),  due  to  memory  access  requests  of  the  IOC. 

'I' 

i)  The  microprogram,  through  the  use  of  the  "emulate"  instruction, 
allows  am  IOC  main  memory  request  before  each  CP  instruction 
fetch.  The  "emulate"  executed  before  CP  macro  instruction 
execution  begins  will  be  referred  to  as  am  "emulate  start." 

ii)  The  sequence  of  events  from  start  CP  instruction  fetch  to 
the  fetch  of  the  next  instruction  is  charted  as  follows: 


(a)  Start  fetch:  Tj.  instruction  fetch  time  (cpu/memory 

work) .  Depends  on  instruction  format: 
RR,  RI,  RL  -  1  memory  cycle 
RK,  RX  -  2  memory  cycles 

(b)  Begin  macroinstruction:  T ^  is  the  data  (operand)  action 

time. 


(C) 

Possible  operand: 

Included  in  T^;  not  in  RR 

memory  references 

instructions . 

(d) 

Resume  MPC  execution 

(e) 

Start  next  instruction 

fetch 
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iii) 


iv) 


v) 


vi) 


vii) 


viii) 


ix) 


The  software  physics  Tx(s.,y)  is  the  total  of  instruction 

% 

fetch  time,  Tj,  and  the  nominal  published  execution  time,  T^. 
The  effects  of  indirect  addressing  or  overlap  will  not  be 
considered  here  but  can  be  accounted  for  by  adding  increments 
to  the  operand  work  performed  and  the  time  T ^  for  processing 
and  additional  fetches. 

Any  CP  macroinstruction  main  memory  reference  is  preceded 
by  an  "emulate"  macroinstruction  to  permit  IOC  access  first. 

The  microcode  services  all  outstanding  IOC  memory  access 
requests  at  any  IOC-caused  suspension  of  CP  emulation. 

Let  n be  the  number  of  microcode  instructions  required  for 
a  single  word  access.  Then  the  number  of  microcode  instruc¬ 
tions  for  a  byte  mode  access  is  also  n  ,  while  the  number 

a 

of  microcode  instructions  for  a  double  word  is  2n  . 

a 

Additional  "emulate"  instructions  are  inserted  into  CP 
emulation  sequences  to  permit  IOC  memory  access  service. 

Let  n &  be  the  total  number  of  "emulates"  (of  any  type)  that 
occur  in  a  single  macroinstruction  microcode  execution  over 
time  Tx(s.,y). 

A  return  microcode  sequence  is  required  whenever  the  IOC 
channel  suspension  of  CP  emulation  occurs  at  an  "emulate" 
which  is  not  an  "emulate  start"  [see  i) ] .  Let  n ^  be  the 
number  of  microinstructions  required  in  the  return  sequence. 

A  memory  cycle  wait  of  t  =  800  nsec  is  required  when  the 
break-in  is  not  on  an  emulate  start. 

A  memory  hold  time,  is  required  for  each  access.  The 

values  of  t ,  are 

n 

input:  t  =  360  nseo 
h 

output:  t,  =  40  nsea 
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6.4.3  Assumptions  for  a  Worst-Case  Degradation 

We  will  develop  a  worst  case  degradation  of  CP  instruction  exe¬ 
cution  time  under  the  assumption  that  the  probability  of  a 
return  sequence  being  required  for  each  trams fer  is  the  same  as 
the  probability  that  the  "emulate"  which  allowed  it  is  not  an 
"emulate  start."  In  actuality,  more  than  one  transfer  can  occur 
per  "emulate"  because  if  more  than  one  buffer  is  active,  then 
there  is  a  non-zero  probability  that  there  are  concurrent  requests 
outstanding.  We  will  also  not  consider  the  effects  of  chaining 
or  of  instruction  fetch  overlaps. 


We  thus  have  that  the  probability  of  a  return  sequence  and  of 
a  memory  cycle  wait  are : 

~  ”<2s  ~  ^  ~  ^es  ~  *  ~  n 

e 

where  P  is  the  probability  that  an  "emulate"  is  an  "emulate 
&s 

start. " 

* 

6.4.4  IOC  -  Augmented  MPC  Execution  Time  -  Tx 


Let  the  number  of  IOC  memory  access  demands/second  be  denoted 
D.„,  Pi.-,*  and  Z),_  „  depending  on  whether  the  request  is  for  a 

(pc  (pio  <pa<5 

byte  (8  bit),  single  word  (16  bit),  or  double  word  (32  bit)  access, 
respectively. 
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* 

where:  Tx  represents  the  augmented  time  of  macro instruction 

execution  due  to  the  n  t  nsec  microcode  execution 

a  a 

times  per  single  word  (or  byte)  accesses ,  the 

nsec  return  sequence  time  and  cycle  wait  time,  t  , 

1 

each  with  probability  (1 - )  for  any  access  and  the 

Q 

memory  hold  time,  t,  . 


Note  that  equation  (6.7)  is  valid  only  when 

n 

V  %8  and  °$32  -  2§  emulates/sec. 

since  accesses  are  allowed  only  by  virture  of  the  recurring 
"emulate"  microinstructions. 


Denoting  the  IOC  powers  for  byte,  single  word  and  double  word 
access  by  Pf$gl ,  and  P^32^'  resPectively  •  (each  in 

KW/sec)  we  have: 


P(<P  )  x  10- 


and  D 


P^32J  X  10" 


$32 


Substitution  in  (6.7)  gives: 

*  p($7j 

Tx  -  Tx  +  Txil  +  [P($J  +  — TT—  +  — 9--]n  t  *  10 

0  c,  C  Q  C 


+  lP($g)  + 

+  I P($g )  + 
+  \P($8)  + 


prV 


PrV 


P(*16} 


P($7J  ,  . 

-  --]  [l-~ — ]n  t  x  10~b 
4  n  v  a 

e 


P^32^  1  -0 

— 7~~~]  11-—]  t  x  10 
4  n  w 


P(*32} 


1 1. 


e 

x  10~6} 
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Collecting  terms  in  P($g) ,  and  P(<p^^)  we  have: 

2n  t  +  (l-j—)(„  t  +t,J  f 

*  °  a  na  p  a  w  a 

Tx  =  Tx{l  +  P(<P32)  [ - - - +  T]  X  10 


n  t  +  .  1- — )(n  t  +t  )  . 

o  a  n  v  a  w  t, 

+  12P($3)  +  P(t16)l  C - ~ - +  f-\  *  20"*} 


(6.8) 


and  this  is  valid  only  when: 

n  ,  2n  _  4n 

pfV  <  s  «  W  ,  p(*„i  <  «  Jo'*  and  pr*32;  W 

or  more  concisely  when: 
fen 

Pr<t>fe;  -  52^  X  *  9>  16 •  22) 

The  term  in  braces  in  equation  (6.8)  is  equal  to: 

Tx/Tx  =  r](4>)  =  1/P,($) 

where : 

h ($)  is  called  the  I/O-Degraded  CP  Execution  Time  Factor, 
and  £($)  is  called  the  I/O-Degraded  CP  Execution  Power 
Factor  for  the  reason  that  it  will  appear  as  a  multiplier 
of  the  instruction  power  P(s^,y)  in  the  expression  for  CP 
execution  power  when  the  IOC  is  concurrently  active. 

6.4.5  IOC-Degraded  Instruction  Power 


We  now  have  a  relative  I/O-degraded  CP  execution  power: 


where  £,.($)  is  the  function  £,($)  evaluated  with  the  quantity 

“If 

n  valid  for  s . . 

e  v 


We  will  formulate  £,.(§)  for  input  and  output  (denoted  £,.(<$> T)  and 
Zj.(<pg),  respectively)  for  two  restricted  instruction  mixes  and 
a  general  mix  considered  typical  by  the  manufacturer,  n and 
are  given  as  5  microinstructions  each  for  the  access  and 

return  microcode  sequences,  t  ,  the  memory  cycle  wait  time,  is 

W 

800  nsec  and  the  memory  hold  time  is  360  nsec  for  input,  40  nsec 
for  output. 


i)  Restricted  Mix  1  -  RI  Add  and  Logical. 

Assume  that  the  instructions  executed  are  limited  to  22  RI 
Add  and  31  RI  Logical  instructions : 

We  have  from  the  manufacturer 1 s  data : 

Tx(s  ..y)  =  1.6  \isec  at  t  =  1SS  nsec 
i  c 

n  ,  the  number  of  "emulates”  is  2  (during  Tn) 

Q  U 

+  1  (during  fetch)  =  3 


Assuming  that  the  double  word  power  Pf^ ^ ^  c^oin:j-nates  <  we 
have  P($)  -  Equation  (6.8)  then  becomes: 


Tx  =  Tx[l  +  (650 


-)  x  10 


pr< j>; 


so  for  input. 


=  - 

v  I  rrnx  1  i  + 


( 650-^—-)  x  JO'6  P(4>) 


_ 1 _ 

i  +  830  *  io~6  pr<j>; 


(6.10a) 
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and  for  output 


ii) 


1  +  (6S0-4r)  x  20"^  P(<f>) 


1 


1  +  670  *  10~6  P(4>) 


(6.10b) 


Restricted  Mix  2  -  RX  Add  and  Logical. 

Assume  that  the  instructions  executed  are  limited  to  22  RX  Adds 
and  31  RX  Logical  instructions. 

From  the  manufacturer's  data: 


Tx(s^y)  =  2.3  uses 

n  is  3  (during  T  )  +  2  (during  fetch)  =  5 
Q  U 


Again  assuming  that 


,  we  have: 


Tx  =  Tx[l  +  (702. 5+z1)  x  10~6  P(<j,}] 


so  for  input. 


t  1  TTTtX  2 


1  +  722.5  x  10"6  P($) 


1  +  882.5  x  10~°  P(<\>) 


(6.11a) 


and  for  output. 


9 

O  tTtl'SC  & 


l 


1  +  722.5  x  10~  P($) 


(6.11b) 
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iii)  General  Mix 


The  manufacturer  has  provided  the  following  instruction  mix 
in  document  PX  11901  and  considers  it  typical: 


17% 

22 

RI 

Adds 

(2 

emulates 

in 

V 

Tx 

= 

1.6 

Usea) 

17% 

22 

RX 

Add 

(3 

emulates 

in 

td; 

Tx 

= 

2.3 

UseaJ 

17% 

31 

RI 

Logical 

(2 

emulates 

in 

V 

Tx 

= 

1.6 

usea ) 

17% 

31 

RX 

Logical 

(3 

emulates 

in 

V 

Tx 

= 

2.  3 

usea) 

12% 

44 

RI 

Jumps 

(2 

emulates 

in 

V 

Tx 

= 

1.3 

usea) 

8% 

Miscellaneous 

(1 

emulate  in  T^; 

Tx  = 

=  C 

1.84 

U  sea) 

6% 

44 

RX  Jumps 

(3 

emulates 

in 

V 

Tx 

= 

2.4 

usea) 

4% 

26 

RX 

Multiplies 

(3 

emulates 

in 

V 

Tx 

= 

4.5 

usea) 

1% 

26 

RI  Multiplies 

(2 

emulates 

in 

td; 

Tx 

= 

4.3 

usea) 

1% 

27 

RK 

divides 

(3 

emulates 

in 

V 

Tx 

= 

7.6 

usea) 

We  add:  1  emulate  in  Tj.  for  R1  and  miscellaneous  instruc¬ 
tions 


or 

2  emulates  in  T j.  for  RX  and  RK  instructions. 


We  then  have  that  there  are  3.8  emulates  in  Tx(s., y),  the 

Is 

execution  time  for  the  mix  representative  (average)  instruc¬ 
tion. 


—6 

So  for  this  average  s.  we  have  Tx(s ., y)  =  2.00  *  10  sea 
_  ^  t 

and  n  =3.8. 
e 

Now  when  P($)  -  Pffy^g)  we  have  from  equation  (6.8): 

Tx  =  Txil  +  (677.7-bTT-)  *  10~6 
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From  which: 


^i^I^mix  G 


1  +  857.  7  x  io~6  P($) 

_ 1 _ 

1  +  697.  7  x  10~6  P(<p) 


If  P($)  -  P(4>iq),  we  have: 


Tx  =  Txll  +  ( 967. 9-h £-)  x  10~6  P($)] 


From  which: 


^i^I^mix  G 


1  +  1148  *10  P(4>) 


^i^o^rnix  G 


1  +  987.9  x  io~6  Pf<J>; 


and  finally  if  P(<^>^)  =  P(^>^)  =  ,  we  have: 


Tx  =  Txll  +  (967.  9+y-)  x  10~6  P(*lg) 


+  (677.  7+~)  x  10'6  P(i p32)] 


u"h  _  ft 

-  Tx  11+  (484.0+338. 9+y-)  *10  P(<t>)] 


‘'■l,  p 

=  Tx  11+  (822.9-by)  x  10~°  P($)] 


From  which: 


^i  ^1^  mix  G 


1  +  1003  x  J0~6  P(<p ) 


^i^o^mix  G 


1 


1  +  842.9  x  JO  p($) 


(6.12a) 

(6.12b) 


(6.12c) 


(6 . 12d) 


(6.13a) 

(6.13b) 
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1 


We  tabulate  and  plot  the  equations  (6.13)  in  Table  6.3  and 
Figure  6.4,  respectively,  giving  £,.(§T)  and  5-(<P  )  the  I/O- 
Degraded  Execution  Power  Factors  for  the  SPERRY- UNI VAC  PX 
11901  General  Instruction  Mix  with  the  IOC  power  composition 
-  P($.n)  =  .  We  additionally  show  in  the  table 

ID  06  6 

and  graph,  the  curve  for  the  anticipated  maximum  degradation 
arising  from  the  case  of  equation  (6.12c)  for  input  power 
when  P($)  =  P($ig). 


t 


* 


i 


I 


IOC  I/O  POWER 

KW/SEC 

I/O-Degraded  Execution 

Input 

pfV  '  *  T  Pl*i6 

Power  Factor 

Output 

;  =  mS2>  = 

-  h(*> 

Min. 

PT 

(maximum 

degradation) 

50 

0.952 

0.960 

0.946 

100 

0.909 

0.92  i 

0.897 

200 

0.832 

0.856 

0.813 

300 

0.769 

0.798 

0.744 

400 

0.713 

0.750 

0.685 

500 

0.666 

0.704 

0.735 

600 

0.624 

0.664 

0.592 

700 

0.588 

0.629 

0.554 

800 

0.555 

0.597 

0.521 

900 

0.526 

0.569 

0.492 

1000 

0.499 

0.543 

0.466 

1200 

0.454 

0.497 

0.421 

1400 

0.416 

0.459 

0.384 

1600 

0.384 

0.426 

0.353 

1800 

0.356 

0.397 

0.326 

2000 

0.333 

0.372 

0.303 

RELATIVE  CP  POWER  DEGRADATION  -  IOC  ACTIVITY 
Execution  Power  Factor  - 

SPERRY-UNIVAC  PX  11901  General  Mix 


Table  6.3 
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|p(0)  =  Pi$16n| 


6.4.6  Relative  IOC-Degraded  Power  for  the  Full  Workload  -  P*(L,Y,^) 


As  a  consequence  of  the  above  and  equation  (6.5)  we  have  for 
the  full  workload: 

l(n.Tx‘P(s  ) 

P*(L,y,  Y)  =  - \ - - - 

lniTx 


n.Tx(s  .,y) 

U— _ - _ 

1  h<*> 


P(a  .ty)‘Z.(*n 
x  x 


T (n  .Tx(s  .,y)/E, .  (<$)) 

u  X  X  X 


\(n  JPx(ajy)  •P(s^i y) ) 
\(n.Tx(s^, y)  (<$>)) 


(6.14) 


valid  when 

P'V  i  wAizv  *  w's  tk  *  8-  IS-  32> 

X 

where : 

Tx(s^, y)  is  the  instruction  time  including  fetch 
expressed  in  seconds. 

n  is  the  number  of  microcode  "emulate”  instructions  in  the 
e 

microcode  sequence  for  a  single  CP  macroinstruction  execu¬ 
tion, 
and 

P('4>^  is  the  k-bit  partial  power  expressed  in  KW/sec 
(k  =  8 ,  16,  32) . 

Note  that  in  equation  (6.14),  the  effect  of  IOC  activity  is  ex¬ 
pressed  purely  in  the  denominator  as  augmentations  of  the  instruc¬ 
tion  execution  times  by  the  multiplicative  factors  T\J$).  This 
corresponds  to  the  notion  that  the  CP  work  done  is  the  same  with 


I 


9 


t 


or  without  IOC  activity  but  the  effective  execution  time  has 
increased. 

The  factor  p .  (<j>) ,  the  reciprocal  of  £,.($),  is  tabulated  and 

4 V 

plotted  in  Table  6.4  and  Figure  6.5  for  the  manufacturer's  PX 
11901  instruction  mix  for  input  and  output  powers  when 

=  =  In  adc^-taon'  values  for  the  theoretical 

maximum  degradation  are  shown  occurring  for  input  where 
P($)  =  P($16). 
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IOC  I/O  POWER  I/O-Degraded  CP  Execution  Time  Factor 


KW/S 

Input 

pr  V  -  Pr  V  - 

2 

Output 

p'VprV  - 

P(4>) 

2 

Max. 

P($)  =  P($16) 

50 

1.050 

1.042 

1.057 

100 

1.100 

1.085 

1.115 

200 

1.202 

1.156 

1.230 

300 

1.300 

1.253 

1.344 

400 

1.403 

1.333 

1.460 

500 

1.501 

1.420 

1.575 

600 

1.603 

1.506 

1.689 

700 

1.701 

1.590 

1.805 

800 

1.802 

1.675 

1.919 

900 

1.901 

1.758 

2.033 

1000 

2.004 

1.842 

2.146 

1200 

2.203 

2.012 

2.375 

1400 

2.404 

2.179 

2.604 

1600 

2.604 

2.347 

2.833 

1800 

2.809 

2.519 

3.067 

2000 

3.003 

2.688 

3.300 

RELATIVE  CP  POWER  DEGRADATION  -  IOC  ACTIVITY 

CP  Execution  Time  Factor  -  r) .  / 

x 

SPERRY-UNIVAC  PX  11901  General  Mix 


Table  6.4 


6.5  AN/UYK-20  CP  POWER  -  DMA  FACILITY  EFFECTS 

6.5.1  General  DMA  Characteristics 

The  AN/UYK-20  design  incorporates  a  direct  memory  access  (DMA) 
facility  which  allows  an  external  device  to  read  from  and 
write  into  main  memory  via  a  second  memory  interface. 

The  incorporation  of  the  DMA  facility  increases  CP  instruction 
execution  times  by  a  small  amount,  65  nsec  maximum.  The  actual 
increases  in  instruction  execution  time  have  been  tabulated  by 
the  manufacturer  for  each  instruction  in  the  SPERRY-UNIVAC 
publication  PX  11772  and  will  not  be  repeated  here.  We  will, 
however,  subscript  the  symbols  y  or  T  in  denoted  execution  times 
with  the  letter  D  to  note  the  fact  that  the  DMA  facility  is  in 
the  system  and  that  the  values  of  T or  Tx  for  instructions  are 
to  include  the  appropriate  increment.  Thus  Tx(s^„ y^)  is  the 
instruction  execution  time  when  the  DMA  facility  is  in. the 
system. 

Another  DMA  feature  provides  for  separate  access  ports  on  each 
of  the  32K  memory  banks.  This  allows  access  by  the  DMA-attached 
device  on  one  bank  concurrent  with  accesses  by  the  CP/IOC  on 
the  other.  Should  requests  for  memory  access  on  the  same  bank 
be  simultaneous  from  the  DMA- attached  device  and  the  CP/IOC, 
priority  of  access  is  given  to  the  latter  units. 

6.5.2  Worst-Case  CP  Degradation  Effects  -  Assumptions 

We  will  first  develop  a  worst  case  execution  time  augmentation 
factor  \i($q)  for  DMA  facility  activity  in  a  memory  bank  with 
concurrent  CP/IOC  activity.  Our  assumptions  are  as  follows: 

i)  Delays  in  instruction  execution  memory  accesses  caused 
by  the  DMA  activity  occur  only  when  a  DMA  memory  read  or 
write  is  already  in  progress.  However,  for  a  worst  case 
analysis,  any  derived  coincidence  of  a  DMA  and  CP  access 
request  will  be  as  if  the  DMA  request  preceded  that  from 
the  CP. 
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ii)  For  double  word  CP  accesses,  we  assume  that  memory  becomes 
available  between  each  of  the  two  single  word  accesses. 

iii)  The  average  XMA  caused  delay  to  the  CP  memory  access  will 
be  taken  to  be  one  half  the  memory  access  cycle  time  (i.e., 
750  -r  2  »  375  nsec).  Effectively,  each  DMA  access  is  for 
a  16  bit  word. 

* 

iv)  We  will  not  consider  the  effects  of  indirect  addressing  here. 

6.5.3  Development  of  the  DMA-CP  Degradation  Factors 

*  we  first  note  that  the  probability  of  a  DMA  read/write  access 

in  progress  is  given  by: 


♦ 


_  memory  access  time 
& D  ~  DMA  access  period 


t 

m 


C 


x  10 


*  io3 

171  U 

2 


where:  t  is  the  memory  access  cycle  time  in  seconds. 
m 

is  the  DMA  power  in  KW/sec. 

Now  considering  the  CP  related  memory  fetches  (instructions  + 
operands)  as  independent  attempts  at  access,  we  have  that  the 
most  probable  or  expected  number  of  times  that  the  CP  would 
encounter  a  DMA  access  in  the  course  of  a  single  instruction 
execution  is: 


d 


Vr  "D 


where:  n  is  the  number  of  accesses  required  in  the  full 
'«Y 

(fetch  included)  execution  of  an  instruction. 
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Letting  n  be  the  number  of  effective  single  word  instruction 
operands  we  have  for: 

HR,  HL,  HI  -  1  instructions,  n  *  1  (fetch  only) 

mY 

RI  -  2  instructions,  n  -  1  +  n 

my  oy 

RK  instructions,  n  *  2  (fetches  only) 
my 

RX  instructions,  n  =*  2  +  n  ^ 

my  oy 

We  now  can  write  for  a  degraded  C?  instruction  execution  time, 
* 

letting  Tx  stand  for  Tx(s, T and  noting  that  the  average  delay 

is  t  *2: 
m 


Tx  -  Tx  +  E-  *t  t  2 
try  m 


=  Tx  + 


n*t2'P( <p-J  x  I(7J 
my  m  TD 


(6.15) 


The  second  term  in  (6.15)  is  the  additive  DMA  Power-Degraded 
Instruction  Time  Augmentation  Factor,  ( $q)  where  the  i  sub¬ 
script  is  used  to  indicate  that  class .of.  instructions  for  which 

the  value  of  n  is  valid. 

my 

6.5.4  DMA  Power-Degraded  Instruction  and  Workload  Power 

We  may  now  write  for  the  degradation  of  CP  instruction  execu¬ 
tion  power,  expressed  relative  to  MPC/DMA  execution  time: 


porW  V 


w(s^3y)  w(si>  y) 

'  Tx  rx<VV  +  W 


W 

P^8i^D^  ^  +  Tx(a  .3y)] 

1r  U  V 


(6.16) 


And  for  the  full  workload: 


l(n.Tx-P(s.,yD) 

In.Tx 


I["ifrafVV  *  W;] 


(6.17) 
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Or  in  terms  of  previously  derived  terms: 

V£-vV  = 

ln.Tx(s.,yD)  + 


(6.18) 


We  will  compute  the  additive  time  factor  \x^($p)  for  the  previously 
described  (Section  6.4.5)  SPERRY-UNIVAC  PX  11901  instruction  mixes. 
We  will  use  the  nominal  t  —  7  SO  x  10  sea  for  the  memory  access 
cycle  time. 

i)  Restricted  mix  1  -  RI  add  and  logical. 

Assuming  that  the  instructions  executed  are  limited  to  22  RI 
add  and  31  RI  logical  instructions,  we  have: 

For  both  instructions  n^  =  1  (fetch  )  +  1  (operand)  =  2 
memory  aeeesses/exeaution. 

From  which: 

2'(7S0*10~9 )2‘P(4> J  x  103 

W  *  - 4 - ° - 

-9 

=  0.2813  P(<PpJ  *  10  seas /execution 


ii)  Restricted  mix  2  -  RX  add  and  logical. 


Assuming  that  the  instructions  executed  are  limited  to  22  RX 
add  and  31  RX  logical  instructions,  we  have: 


For  both  instructions  n  =  2  (fetch)  +  1  (operand)  =  3  per 
memory  access /execution. 

From  which: 


W 


Z-!7S0*UT9 12-P(h  )  *  10s 

4 


=  0.4219  P(4  )  X  10 


-9 


s ecs /execution 
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iii)  PX  11901  General  mix. 

We  have  the  following  numbers  of  instruction  main  memory 
accesses  (single  word  equivalent)  per  execution: 


17%  22  R1  Adds 
17%  22  RX  Adds 
17%  31  R1  Logical 
17%  31  RX  Logical 
12%  44  RX  Jumps 
8%  Miscellaneous 
6%  44  RX  Jumps 
4%  26  RX  Multiplies 
1%  26  RI  Multiplies 
1%  27  RK  Divides 


(2  accesses /execution) 

(3  accesses/ execution) 

(2  accesses/execution) 

(3  aacesses/execution) 

(2  accesses/ execution) 

( est.  2  accesses/execution) 
(3  accesses/execution) 

( 3  accesses/execution) 

(2  accesses /execution) 

(2  accesses/execution) 


From  the  above  values  we  obtain  the  weighted  average  number 

of  accesses  n  =2.44 
m Y 

From  which: 

2.44'(7S0-xlCT9 )2'P($n)  x  103 

W  ■ - 3 - 2 - 

-9 

=  0.3431  Pfbp)  x  10  secs/execution 

We  now  show  y.  ($-.)  tabulated  in  Table  6.5  and  plotted  in 

x  u 

Figure  6.6  for  n  =  1,  2,  2.44 ,  3 ,  4>  5.  These  values,  it 
mi 

will  be  recalled,  are  increments  to  be  added  to  the  times 

Tx(8  .,yn) ,  the  DMA-installed  instruction  or  class  representa- 
x  u 

tive  execution  times,  for  the  computation  of  augmented  DMA 
Power-degraded  instruction  execution  times  and  degraded 
relative  powers. 
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COMMON  BANK  CP  INSTRUCTION  TIME  AUGMENTATION  FACTOR  - 


DMA  POWER 

Pry 

KW/S 


W 


fx!#  seconds) 


Instruction  Main  Memory  Accesses; 


1 

2 

2.44* 

3 

4 

5 

50 

7.03 

14.06 

17.16 

21.09 

28.13 

35.16 

100 

14.06 

28.13 

34.31 

42.19 

56.25 

70.31 

200 

28.13 

56.25 

68.63 

84.38 

112.5 

140.6 

300 

56.25 

84.38 

102.9 

126.6 

168.8 

210.9 

400 

84.38 

112.5 

137.3 

168.8 

225.0 

281.3 

500 

112.5 

140.6 

171.6 

210.9 

281.3 

351.6 

600 

140.6 

168.8 

205.9 

253.1 

337.5 

421.9 

700 

168.8 

196.9 

240.2 

295.3 

393.8 

492.2 

800 

196.9 

225.0 

274.5 

337.5 

450.0 

562.5 

900 

225.0 

253.1 

308.8 

379.7 

506.3 

632.8 

1000 

253.1 

281.3 

343.1 

421.9 

562.5 

703.1 

1200 

281.3 

337.5 

411.8 

506.3 

675.0 

843.8 

1400 

337.5 

393.8 

480.4 

590.6 

787.5 

984.4 

1600 

393.8 

450.0 

549.0 

675.0 

900.0 

1125. 

1800 

450.0 

506.3 

617.6 

759.4 

1012. 

1266. 

2000 

506.3 

562.5 

686.3 

843.8 

1125. 

1406. 

*  Average  for  SPERRY-UNIVAC  PX  11901  General  Mix 


AN/UYK-20  RELATIVE  CP  POWER  DEGRADATION  -  DMA  ACTIVITY 

Table  6.5 


6-44 


memory  access  cycle  time  =  750  nsec 
-  Instruction  memory  accesses  in  Tx( 


AN/UYK  20  CP  POWER  DEGRADATION  -  DMA  ACTIVITY 


As  an  illustration  of  the  DMA  activity  CP  degradation  effect 
in  terms  of  instruction  power,  let  us  consider  the  impact  on 
the  RI  and  RX  Add  instructions  of  our  previously  employed 
restricted  mixes. 

i)  22  RD  Add 

We  have,  as  before,  n  =  2. 

my 

From  the  instruction  specifications  in  SPERRY-UNIVAC 
PX  11772, 

Tx(si3yD)  =1.64  x  10~6 
So  we  obtain: 

w(si>vD} 

~  Tx(s.t yn) 

t  D 

WI(SV'<D)  +  WD(si^D} 

Tx(si^D} 

=  - -  +~  2—f.  =  2439  KW/S 

1.64  x  J0~° 

Now,  when  there  is  DMA  activity: 

w(Si,yD) 

VWV  =  Tx(svyD)  + 

-  - —6 -  W/S 

1.64  x  10~°  + 
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ii)  22  RX  Add 


For  this  instruction, 


n  =  3 
my 


Tx(s  .3yJ  =  2.40  x  id 
xr  D 


-6 


sec. 


W(s^y^)  =4  +  2  =  6  works 


From  which, 


P(s.,yJ  =  - - - -r  =  2500  KW/S 

7'  °  2.40  x  W'* 

and 


PDI'SV^D3 


_ 6 _ 

2.40 x  1Q~6  +  Vi($D) 


W/S 


For  both  of  these  instructions  we  tabulate  the  relative  DMA- 

*  * 

Degraded  Execution  Power  P  (s  -,y  ~)  ,  shortened  to  P_,  for 

Is  s  Is  is 

convenience,  in  the  Table  6.t-  We  also  define  a  multiplicative 

* 

DMA-Degraded  Instruction  Execution  Power  Factor,  £.(*,)  =  ?  /?, 

x  D  D 

where  P  is  the  instruction  power  when  Pfi^,)  -  0.  This  factor, 
analogous  to  the  one  developed  for  IOC  activity  in  Section 
6.4.4,  is  also  shown  in  the  table  and  is  plotted  in  Figure  6.7 
for  the  RI  ADD  instruction. 
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AD-AU3  103  INSTITUTE  FOR  SOFTWARE  ENGINEERING  PALO  ALTO  CA  F/G  9 

AN/UYK-20  CONFIGURATION  CAPACITY ►  NELC-UDI-A-106. CU> 

JAN  78  L  M  TRAISTER  N66001-77-C-0252 

UNCLASSIFIED  NL 


DMA  POWER 

P(<M 


COMMON  BANK  DMA- CP  EXECUTION 


22  RI  ADD  P  =  2439  KW/S 
* 

PD 

(KW/S) 


vv 


22  RX  ADD  P  =  25(9(7  9s#/S 
* 

PD 

(KW/S) 


VV 


2418. 

0.9914 

2478. 

0.9913 

2398. 

0.9832 

2457. 

0.9827 

2358. 

0.9668 

2415. 

0.9960 

2320. 

0.9512 

2375. 

0.9499 

2282. 

0.9356 

2336. 

0.9343 

2246. 

0.9209 

2298. 

0.9192 

2211. 

0.9065 

2262. 

0.9046 

2178. 

0.8930 

2226. 

0.8904 

2145. 

0.8795 

2192. 

0.8767 

2113. 

0.8663 

2159. 

0.8634 

2082. 

0.8536 

2126. 

0.8505 

2023. 

0.8294 

2064. 

0.8258 

1967. 

0.8065 

2006. 

0.8025 

1914. 

0.7847 

1951. 

0.7805 

1864. 

0.7642 

1899. 

0.7596 

1816. 

0.7446 

1850. 

0.7399 

CP 

POWER 

22  RI, 

DEGRADATION- DMA  ACTIVITY 

RX  ADD  INSTRUCTIONS 

Table  6.6 
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6.6  DISCUSSION  -  CP  /  I/O  INTERACTIONS 

6.6.1  Introduction 

From  models  of  interaction  betewen  the  IOC  and  the  CP  and  the 
DMA  facility  and  the  CP,  we  have  developed  execution  time  aug¬ 
mentation  factors  and  which  lead  to  the  multiplica¬ 
tive  power  factors  and  E,  ( .  All  of  these  are  expressed 

as  functions  of  the  instruction  execution  time  Tx(s.,y)  and  the 

i 

Input  or  Output  powers  P($j)  or  P( for  the  IOC  and  P($p)  for 
the  DMA.  We  will  present  a  brief  amplification  of  the  meaning 
of  these  factors  and  a  discussion  of  the  significance  of  the 
power  factor  values  derived. 

6.6.2  The  Time  Augmentation  Factors 

The  factors  r)^  and  \l($q)  augment  the  execution  time  of  a 
processor,  T,  which  emulates  CP  and  IOC  memory  access  activity. 
This  processor  includes  those  facilities,  which  normally  emula¬ 
ting  the  CP,  must  suspend  that  function  and  service  I/O  memory 
access  requests. 

We  have  defined  the  execution  time  of  the  processor  T  to  be 
identical  with  that  of  the  CP  when  there  is  no  IOC  or  DMA 
activity,  i.e., 

Tx(L,T)  =  Tx(l,y) 

(6.19) 

Tx(L,Td)  =  Tx(L,  yD) 

when  P($)  =  -  0 

The  time  augmentation  factors  t\($)  and  v($q)  increase  Tx(L,T) 
or  Tx(L,Tp)  over  their  Y-processor  (CP)  based  execution  times; 

T)  operates  by  multiplication  and  y  by  addition  because  of  the 
differences  in  the  models  from  which  each  factor  is  derived. 


When  there  is  input/output  activity  from  either  the  IOC  or  DMA 
we  do  not  increase  Tx(L, y)  because  the  y-processor  is  defined 
to  be  stopped  when: 

(a)  The  MPC  is  servicing  I/O  requests  for  memory  access. 

(b)  CP  memory  access  is  blocked  because  an  IOC  or  DMA 
access  is  in  progress. 

Thus  the  absolute  power,  P(L, y)  of  the  CP  has  not  changed;  it 
is  the  CP  power  relative  to  the  T-processoi;  P*(L,y,T),  which 
decreases  with  I/O  activity.  This  distinction  is  emphasized 
because  the  instruction  or  workload  power  equations  (6.9), 
(6.14),  (6.16),  (6.17)  and  (6.18)  express  time  in  terms  of 
Tx(s  .}y) ,  instruction  execution  time  for  the  CP. 

6.6.3  The  Power  Degradation  Factors 

The  factors  and  each  are  the  ratio  of  a  relative 

power  of  an  instruction  when  there  is  no  IOC  or  DMA  activity 
to  that  when  there  is  IOC  or  DMA  power  in  use.  Because  of  the 
relationships  (6.19)  we  have: 

P(s.,y,T)  -  P(s.,y) 

L'  (6.20 

prW  V  =  PfVV 

when  —  P($)  —  0 

Because  of  the  relationships  (6.20)  we  were  able  to  develop 
the  relative  power  ratios  £  by  considering  only  the  absolute 
CP  instruction  execution  powers  and  are  able  to  express  the 
full  workload  relative  powers  degraded  by  I/O  activity  (equa¬ 
tions  (6.14),  (6.18)  in  terms  of  the  absolute  CP  instruction 
execution  powers. 


6.6.4  Significance  of  the  Power  Factor  Values 

Inasmuch  as  the  power  factors  ^  show  degradation  of  relative 
instruction  and  workload  power  for  the  CP  when  there  is  I/O 
activity,  they  may  be  thought  of  as  reductions  to  the  instruc¬ 
tion  throughput  caused  by  the  interactions  described  by  the 
models. 

The  IOC  activity,  in  particular,  was  theoretically  shown  to 

reduce  the  instruction  relative  power  P(s  y,D  to  as  little 

t 

as  30%  of  its  non-I/O  active  value  for  maximum  IOC  power.  By 
contrast,  access  through  the  DMA  facility  on  a  common  memory 
bank  with  the  CP  degrades  P(a^}y^T^)  to  no  less  than  75%  of 
its  non-DMA  active  level.  These  differences  can  be  explained 
by  the  demands  that  the  IOC  makes  on  the  microprogrammed  con¬ 
troller  (MPC)  for  servicing  its  memory  access  requests.  The 
DMA  facility,  on  the  other  hand,  requires  that  devices  using 
it  must  provide  their  own  memory  interface  logic  through  addi¬ 
tional  low  priority  ports  to  the  32K  memory  banks .  Since 
these  ports  are  of  lower  priority  than  those  for  the  CP/IOC, 
the  likelihood  of  overruns  on  DMA  connected  devices  is  sub¬ 
stantial.  In  fact,  it  is  doubtful  that  one  could  achieve  in 
practice  DMA  input/output  power  levels  comparable  to  those  for 
the  IOC  channels  without  experiencing  frequent  overruns. 

System  and  program  designers  must  be  aware  of  the  consequences 
of  the  CP- IOC,  CP-DMA  instructions  in  regard  to  the  effects 
they  can  have  on  workload  throughput  and  system  response  times. 
These  considerations  cure  a  substantial  portion  of  the  factors 
that  would  determine  the  performance  characteristics  of  a 
workload  distributed  over  AN/UYX-20  configurations.  In  turn 
we  could  expect  that  performance  requirements ,  the  knowledge 
of  acceptable  trade-offs  and  the  availability  of  processor 
power  characteristics  would  provide  the  basis  for  satisfactorily 
performing  AN/UYK-20  configurations. 
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ASYMPTOTIC  POWER 

S««  the  discussion  under  the  entry  P(A) 


CAPACITY 

Two  forms  of  computing  system  capacity  are  identified  in  software 
physics:  1)  processor  capacity,  expressed  in  units  of  software 
power  (works/second) ,  and  2)  storage  capacity,  expressed  in  units 
of  byte-seconds  or  their  equivalent  on  non-byte  computer  systems. 

In  either  case,  the  quantity  determined  to  be  the  capacity  of  the 
system  must  be  calculated  from  theoretical  considerations,  and 
cannot  be  obtained  directly  by  measurement.  Measured  values  repre¬ 
sent  the  quantity  used,  not  the  available  quantity  of  power  or 
byte-seconds. 

Both  processor  capacity  and  storage  capacity  can  be  determined  as 
appropriate  for  individual  devices,  subconfigurations ,  or  the  full 
system  configuration.  In  general,  processor  capacity  is  primarily 
a  function  of  equipment  speeds  and  configuration  connections,  and 
secondarily  a  function  of  workload  characteristics.  Storage  capa¬ 
city  is  simply  the  total  storage  available  by  equipment  class  or 
subconfiguration  over  time. 

The  amount  of  power  actually  used  is  the  quantity  normally  called 
performance.  Thus,  processor  capacity  and  performance  are  directly 
relatable  quantities:  one  is  the  power  available,  the  other  is  the 
power  used.  The  ratio  of  performance  to  capacity  is  called  the 
efficiency  of  the  workload. 

See  the  power  entry  for  a  more  detailed  discussion  of  capacity. 

CONFIGURATIONS  AND  SUBCONFIGURATIONS 

A  configuration  is  an  arbitrary  collection  of  processors  and  storage 
devices,  normally  connected  so  that  processors  cause  data  to  flow  to 
and  from  storage  devices.  In  certain  applications  of  software  physics, 
however,  one  is  not  limited  to  fully  connected  configurations. 


A  sub configuration  is  a  configuration  within  a  configuration.  Often, 
the  prefix  MsubH  is  not  used  when  dealing  with  parts  of  a  full  con¬ 
figuration.  For  example,  "channel  configuration”  is  the  collection 
of  a  channel,  control  units,  and  the  drives  (printers,  terminals, 
etc.)  which  can  be  addressed  through  the  channel.  This  configuration 
is  part  of  the  I/O  configuration,  which  is  the  set  of  all  such  channel 
subconfigurations.  The  "full  configuration"  is  a  special  term  which 
includes  all  processors  (cpu  and  I/O)  and  all  storage  devices  under 
consideration . 

Greek  letters  are  used  in  software  physics  to  represent  configura¬ 
tions  and  subconf igurations .  See  the  software  physioa  notation 
entry  for  the  symbols  used  for  the  standard  configurations  and  sub¬ 
configurations  . 


CONTAINER  (STORAGE) 

A  container  is  a  portion  of  a  storage  medium  which  can  be  separ¬ 
ately  addressed.  Its  size  is  measured  in  the  number  of  bytes 
which  are  contained  in  the  container,  for  byte-oriented  machines. 

An  8-bit  container  has  been  arbitrarily  designated  as  the  stand¬ 
ard  size  container,  in  software  physics.  As  a  result,  non-byte 
oriented  machine  container  sizes  are  determined  by  dividing  the 
number  of  bits  by  eight.  Containers  such  as  cards  and  print  posi¬ 
tions  on  paper  are  counted  as  having  a  size  equal  to  the  number 
of  bytes  (or  bits  *  8)  required  to  either  read  or  write  a  character. 

DEVICES 

Devices  are  considered  as  either  processors  or  storage  devices. 
Generally  speaking,  a  device  is  the  lowest  level  component  of  a 
configuration  or  subconfiguration.  For  example,  a  tape  drive  is 
considered  as  a  device,  as  is  a  tape  drive  control  unit.  To¬ 
gether,  these  devices  would  make  up  a  tape  control  unit  subcon¬ 
figuration.  Extending  this,  a  channel  is  considered  as  a  device, 
but  the  collection  of  chaumel,  control  unit,  and  tape  drives  would 
be  a  channel  subconfiguration. 
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The  lowest  level  of  a  configuration  or  subconfiguration  is  still  a 
"configuration."  In  software  physics,  configurations  and  sub config¬ 
urations  are  generally  denoted  by  a  Greek  symbol.  In  particular, 

I  because  devices  are  the  lowest  level  of  a  configuration  they  are 

normally  symbolized  by  Greek  letters;  e.g.,  5  is  a  drive,  y  is  a 
cpu.  However,  to  avoid  confusion  between  channel  devices  and  con¬ 
figurations,  a  lower  case  a  is  used  to  denote  the  device,  Greek  a 
t  is  used  to  denote  the  channel  configuration  with  attached  control 

unit  subconfigurations.  Similarly,  b  is  used  to  denote  the  control 
unit  as  a  device,  S  is  used  to  symbolize  the  configuration  with 
attached  drives. 

FORCE,  SOFTWARE 

In  general,  a  force  is  the  agent,  "mechanism,"  or  method  by  which 
energy  is  converted  to  work.  So  closely  are  the  concepts  of  force 
and  energy  linked  that  nearly  two  centuries  elapsed  after  Newton 
before  a  distinction  was  commonly  made  between  them  in  the  classical 
physics.  In  software  physics,  the  means  by  which  software  energy 
results  in  software  force  is  through  the  agency  of  an  instruction, 
either  cpu  or  I/O.  As  a  result,  software  physics  considers  an 
instruction  as  formally  representing  a  force.  A  unit  of  software 
is  a  collection  of  instructions  and  associated  operands,  where 
operands  can  be  considered  as  being  the  software  physics  analogies 
of  inertial  mass.  Together,  the  result  is  that  a  software  unit  is 
considered  equivalent  to  the  classical  physics  system  of  forces 
acting  on  point  masses. 

Software  force  is  measured  in  units  of  work/byte.  The  direction  of 
action  of  a  software  force  is  from  the  container  accessed  to  the 
container  receiving  the  symbols  transferred.  Software  force  is  a 
vector  quantity  when  a  given  instruction  transfers  symbols  from 
more  than  one  set  of  source-target  containers. 
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MBTR  (MAXIHUM  BYTE  TRANSFER  RATE) 


This  is  the  rate  at  which  I/O  data  is  read  or  written,  excluding  all 
time  required  to  position  or  otherwise  locate  data.  It  is  normally 
given  by  equipment  manufacturers  as  either  the  rated  speed  (e.g. , 
2000  lines  per  minute)  or  data  transfer  rate  (e.g.,  806,000  bytes 
per  second  for  a  3330  disk  drive) .  In  practice,  this  data  transfer 
rate  is  never  achievable  except  for  burst  rate  conditions  due  to 
a  variety  of  set-up  and  positioning  time  requirements.  When 
thought  of  in  terms  of  software  work  rather  than  bytes,  this  value 
is  also  called  the  asymptotic  power  of  the  device. 

$(A)  (ASYMPTOTIC  POWER) 

The  asymptotic  theoretical  power  or  capacity  of  a  configuration 
is  symbolized  as  the  vector  quantity  P(A).  Its  elements  are  the 
asymptotic  powers  of  the  ideal  configuration,  by  equipment  class, 
denoted  P^.  Asymptotic  power  is  equivalent  to  the  maximum  byte 
transfer  rate-,  converted  to  units  of  software  work  and  power, 
available  from  the  equipment  being  considered.  For  subconfigura¬ 
tions  including  disks  and/or  tapes,  the  asymptotic  power  is  cal¬ 
culated  using  the  first  level  of  "bottlenecking." 

Asymptotic  power  calculations  for  disk,  tape,  and  other  variable 
block  length  devices  assume  an  infinite  block  length.  For  fixed 
block  length  devices,  such  as  printers,  the  maximum  byte  transfer 
rate  or  its  equivalent  as  specified  by  the  manufacturer  and  con¬ 
verted  to  units  of  power  is  used;  e.g.,  2000  lines  per  minute. 

For  central  processors,  the  asymptotic  power  is  calculated  by 
dividing  the  bytes  accessed  from  buffer  or  main  storage  by  the 
smallest  corresponding  cycle  time. 

PERFORMANCE 

In  common  usage  this  term  is  not  associated  with  any  specific 
quantitative  value.  In  software  physics,  the  word  is  fully 


t 


equivalent  to  the  word  software  power,  and  is  normally  associated 
with  the  power  usage  uf  the  workload,  P(L,ty) .  This  is  called 
throughput  power.  If  the  batch  workload  alone  is  considered,  then 
throughput  power  is  completely  equivalent  to  the  cosnton  measure 
"throughput";  i.e.,  "jobs"  per  hour.  More  generally,  however, 
performance  may  be  defined  for  any  level  of  configuration  and/or 
subworkload  by  taking  the  appropriate  power  usage  measure. 

The  quantitative  definition  of  performance  as  the  level  of  software 
power  usage  is  fully  in  accord  with  the  intuitive  meaning  of  the 
word,  but  its  use  in  this  sense  requires  one  clarification.  When 
a  portion  of  the  workload  is  removed,  such  as  may  be  done  by 
changes  to  the  operating  system  or  using  TSA  runs,  capacity  is 
recovered.  But  capacity  is  the  power  available  for  use  by  the 
workload.  As  such,  the  power  usage  level  (performance)  may  decrease 
quantitatively.  Generally,  a  portion  of  the  recovered  power  will 
go  into  the  workload,  and  a  decrease  in  elapsed  time  will  occur. 
However,  the  reduction  in  time  may  not  be  proportional  to  the 
reduction  in  software  work.  Thus,  recovering  capacity  by  reducing 
the  quantity  of  software  work  to  be  performed  may  result  in  a 
decrease  in  the  level  of  power  usage  even  though  a  reduction  in 
elapsed  time  is  also  observed. 

Performance  is  directly  related  to  workload  efficiency,  as  the 
latter  is  equal  to  P(Li'if)  t  P(C) .  'ttius,  for  a  given  value  of 
2(0,  one  may  quote  performance  either  in  units  of  power  or 
in  percent  efficiency. 


POWER,  SOFTWARE 

Software  power  is  the  link  between  the  quantity  of  software  work 
to  be  performed  and  the  time  required  to  accomplish  it.  The  term 
may  be  used  in  either  of  two  senses:  1)  the  power  used,  formally 
defined  as  the  work  performed  divided  by  the  time  to  accomplish  it, 
or  2)  the  power  available  from  a  device,  configuration,  or  equip¬ 
ment  class,  calculated  from  theoretical  considerations.  When  used 
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in  the  first  sense,  power  is  equivalent  among  other  things  to  the 
concept  of  performance.  In  the  second  sense,  power  is  equivalent 
to  the  concept  of  capacity. 

t  Power  usage  is  defined  as  the  ratio  of  work  performed  to  the  time 

required  to  perform  it.  Time,  however,  can  be  measured  in  a 
variety  of  ways.  If  the  execution  time  of  a  device,  subconfigura¬ 
tion,  or  equipment  class  is  used,  then  the  power  value  calculated 
f  is  the  work  performed  by  these  divided  by  the  execution  time  of 

the  equipment.  The  execution  time  is  often  less  than  the  externally 
observed  elapsed  time  of  the  full  configuration  processing  the  full 
quantity  of  work.  This  results  in  two  possible  ways  of  calculating 
power  usage  level: 

1)  The  power  used  by  a  device,  subconfiguration,  or  equipment 
class  relative  to  the  full  configuration  elapsed  time. 

This  is  called  the  relative  power. 

An  example  would  be  the  relative  cpu  power,  P(L3Yjip)  * 

W(L3 y)  t  Tx(Ljip).  It  represents  the  work  performed  by  the 
cpu  during  the  entire  period  of  time  required  to  process 
the  workload  L,  which  would  normally  include  some  time 
when  the  cpu  was  not  executing  any  instructions. 

2)  The  power  used  by  a  device,  a  subconfiguration,  or  equipment 
class  when  and  only  when  the  corresponding  processors  are 

in  execution.  That  is,  the  overall  elapsed  time  of  higher 
level  systems  is  of  no  concern  in  this  calculation,  only 
the  absolute  time  of  execution  of  the  processors  being 
considered.  This  is  called  the  absolute  power.  Using  the 
example  of  the  cpu  again,  the  absolute  power  of  the  cpu 
is  the  work  performed  divided  by  the  seconds  of  cpu 
execution  time  required  to  do  so.  Symbolically,  one  has 
P(L,y)  «  W(L, y)  r  Tx(L, y). 

Relative  power  is  related  to  absolute  power  by  the  corresponding 
percent  utilization  factor.  In  the  cpu  example,  the  relative 
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cpu  power  equals  the  product  of  the  quantity  percent  cpu  utiliza¬ 
tion  and  the  absolute  cpu  power.  Since  cpu  utilization  equals 
Tx(L,y)  *  Tx(L,ty) ,  one  has  P(L, Ysty)  »  (%  cpu  utilization)  *  P(L, y). 

The  absolute  power  usage  arises  from  the  equipment  speeds  and  basic 
workload  parameters  such  as  instruction  mix  and  block  sizes. 

Relative  power  usage  levels  reflect  absolute  power  parameters,  the 
proportions  of  power  between  sub configurations  and  equipment 
classes,  and  the  ratios  of  work  to  be  performed  in  these  subsystems. 
That  is,  relative  power  usage  levels  reflect  both  basic  workload 
parameters  and  the  "fit"  between  the  workload  and  the  computing 
system. 

Since  absolute  power  calculations  do  not  require  knowledge  of  the 
overall  workload  characteristics,  the  theoretical  absolute  power 
available  from  a  device,  subconfiguration  or  equipment  class  can 
be  calculated.  The  theoretical  considerations  include  the  factors 
which  in  general  can  affect  the  absolute  power  levels  attainable 
from  the  equipment.  As  such,  they  identify  the  effect  of  changes 
and  provide  a  means  of  determining  the  power  loss  due  to  the  way 
the  equipment  is  being  used.  For  a  given  quantity  of  work,  it  is 
then  possible  to  calculate  the  changes  in  execution  time  which 
will  result  from  a  new  level  of  absolute  power  usage  obtained  by 
altering  the  manner  of  equipment  use. 

Relative  power  involves  the  use  of  elapsed  time,  and  overall 
elapsed  time  can  be  predicted  from  a  knowledge  of  the  absolute 
power  usage  levels  attainable  and  the  quantities  of  work  to  be 
performed  by  equipment  class  and/or  subconfiguration.  This  is 
accomplished  using  a  full  workload  characterization  with  "offset" 
information.  Offsets  represent  the  quantity  of  cpu  work  to  be 
performed  before  a  quantity  of  I/O  work  can  be  performed.  They 
axe  established  empirically  for  a  given  workload  from  execution 
time  profiles  and  their  equivalent  form,  work  concurrency  charts . 

For  additional  information,  see  the  corresponding  entries  in  this 
glossary. 
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The  comparison  of  actual  power  levels  to  theoretical  power  levels, 
and  the  determination  of  the  relative  importance  of  the  factors 
degrading  actual  power  from  the  maximum  attainable  power,  provides 
the  knowledge  necessary  to  formulate  an  installation  performance 
improvement  plan.  Inherent  in  such  a  plan  would  be  a  recognition 
of  the  cost-effectiveness  of  various  possible  performance  improve¬ 
ments,  and  trade-offs  between  these  activities  and  additional 
equipment  plans.  The  same  knowledge  needed  for  this  plan  is 
necessary  to  correctly  evaluate  and  understand  the  effects  of 
possible  new  equipment  on  performance. 

PROCESSORS 

A  processor  is  any  collection  of  digital  circuitry  which  is  capable 
of  accepting  an  instruction  and  executing  it;  i.e.,  generating  the 
set  of  logical  state  changes  represented  symbolically  by  the  instruc¬ 
tion.  Typical  processors  are  cpu's,  disk  drives,  tape  drives, 
printers,  terminals,  various  control  units,  channels,  etc. 

It  is  not  true  that  processors  of  interest  need  to  be  separately 
packaged  as  a  distinct  physical  entity.  For  example,  disk  drives 
often  come  two  or  more  to  a  package.  For  this  reason,  a  subconfig¬ 
uration  composed  of  several  different  device-level  processors  can 
also  be  considered  as  a  single  processor.  For  example,  a  channel 
device  with  attached  control  units  and  disk  drive  devices  can  be 
treated  as  if  it  were  a  single  processor.  Such  processors  are 
called  equivalent  processors  when  necessary  to  distinguish  between 
processors  packaged  in  a  single  box  and  processors  whose  circuitry 
is  distributed  among  several  boxes. 

Additionally,  the  ability  of  a  configuration  to  handle  a  forecasted 
workload  needs  to  be  determined.  It  is  much  more  convenient  for 
these  purposes  to  use  an  equivalent  form  of  an  execution  time  pro¬ 
file  expressed  in  terms  of  software  work  rather  than  time.  Such 
a  chart  is  called  a  work  concurrency  chart.  It  is  constructed  by 
multiplying  the  execution  time  components  of  the  profile  by  the 
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corresponding  actual  absolute  power  levels.  The  overall  Pert 
structure  of  the  profile  is  reflected  into  the  work  concurrency 
chart  by  offsetting  I/O  work  by  equipment  class  by  an  amount  equal 
to  the  cpu  work  corresponding  to  the  time  the  cpu  is  in  execution 
but  not  the  equipment  class.  These  quantities  of  cpu  work  are 
called  "cpu  offsets",  and  one  per  equipment  class  is  calculated. 

Given  a  work  concurrency  chart  representing  the  typical  offsets 

found  in  an  installation  workload,  changes  in  the  quantitites  of 

■  > 

work  by  equipment  class  and/or  observed  absolute  power  levels  are 
easily  translated  to  an  execution  time  profile.  More  importantly, 
workload  forecasts  can  be  translated  to  execution  time  profiles. 

The  elapsed  time  of  the  new  system  with  the  new  or  forecasted 
workload  is  also  easily  calculated.  The  new  percent  utilizations 
are  also  predicted  by  the  same  calculations. 

RESPONSE  TIME 

Response  time  is  normally  associated  with  the  elapsed  time  between 
inputing  a  command  or  inquiry  to  an  on-line  system  and  receiving  a 
response.  Since  the  work  to  be  performed  is  a  function  of  the 
nature  of  the  input,  two  major  techniques  for  determining  response 
time  are  used:  1)  the  time  for  a  standard  input  or  set  of  inputs 
is  measured,  or  2)  a  percentile  of  actual  response  times  is  chosen, 
e.g.,  "85%  of  all  response  times  axe  equal  to  or  less  than  5  seconds." 

In  software  physics,  response  time  is  clearly  a  function  of  the 
'fQtk  vector  corresponding  to  the  input,  and  the  vector  power 
delivered  into  the  on-line  system  on  behalf  of  the  input.  Conceptu¬ 
ally  »  given  these  two  quantities,  determination  of  response  time  is 
a  straightforward  calculation.  In  practice,  these  quantities  are 
often  difficult  if  not  impossible  to  obtain  due  to  lack  of  proper 
instrumentation.  However,  it  is  interesting  to  note  that  queuing 
theory  parameters  needed  for  response  time  prediction  are  generally 
adequate  for  software  physics  purposes  as  well. 
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SOFTWARE  PHYSICS 

Software  physics  is  the  study  of  the  quantitative  euid  measurable 

properties  of  executable  instructions  and  their  operands,  and 

their  interactions  with  computing  systems  equipment  and  configurations 

SOFTWARE  PHYSICS  NOTATION 

Software  physics  uses  a  special  form  of  notation  designed  to 
identify  three  items  of  interest: 

1)  the  property  to  be  measured.  The  properties  of  general 
interest  acre  software  work  (W) ,  execution  time  (Tx) , 
elapsed  time  (Te) ,  storage  occupancy  (S) ,  available  store 
(Z) ,  Power  (P) ,  storage  capacity  usage  (Ca) ,  and  Intensity 
(I). 

2)  the  unit  of  software  whose  executable  code  and/or  operand 
properties  are  to  be  measured.  The  symbols  used  are  S 

to  represent  a  general  software  unit  and  L  to  represent 
that  software  unit  representing  the  full  workload.  Sub¬ 
scripts  are  used  to  denote  constituent  software  units  and/or 
subworkloads.  For  an  actual  software  unit,  called  say 
"Job  XYZ",  the  actual  name  "XYZ"  would  be  used  instead  of 
S.  Similarly,  the  word  "Batch"  might  be  used  for  the  batch 
subworkload. 

3)  the  set  of  equipment  over  which  the  value  of  a  desired 
property  is  to  be  obtained.  Either  a  configuration  or 
subconfiguration,  or  an  equipment  class  (but  not  both 
unless  they  are  identical)  may  be  specified.  Configurations 
are  identified  by  lower  case  Greek  letters,  equipment 
classes  by  abbreviations.  Typical  configuration  symbols 
used  are  i|)  for  the  full  configuration,  y  for  the  cpu,  $ 

for  I/O,  a  for  a  channel  subconfiguration,  6  for  a  control 
unit  subconfiguration,  and  5  for  a  drive.  Equipment  class 
abbreviations  are  cpu  for  control  processor,  disks  for  disk 
drives,  tapes  for  tape  drives,  ptr  for  high  speed  printers, 
etc. 
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The  structure  of  the  notation  permits  a  desired  measurement  to  be 
symbolized  precisely.  The  property  to  be  measured  is  given  first, 
followed  in  parentheses  by  the  software  unit(s)  to  be  measured 
and  then  by  the  configuration (s)  or  equipment  classes  to  be 
measured.  If  the  property  is  to  be  measured  for  more  than  one 
software  unit,  both  are  given,  in  the  order  of  their  occurrence  in 
the  corresponding  equation.  Similarly  for  configurations  and 
equipment  classes. 

Vector  representations  are  denoted  by  an  arrow  over  the  property 
symbol .  Examples : 

Tx(L,  ip):  the  equation  time  of  the  full  workload  on  the  full 
configuration . 

Tx(L,  disks) :  the  execution  time  of  the  disk  drive  equipment 
class  for  the  full  workload. 

W(L, ip):  the  software  work  of  the  full  workload  on  the  full 
configuration  (a  single  number) 

W(Lj  tpj :  a  vector  representation  of  the  quantity  W(Lt  ty) . 

PtLjYity)  m  W(Lt y)  4  Tx(L,ii) :  the  relative  cpu  power. 

WARE  PHYSICS  PROPERTIES 

The  term  "properties"  denotes  a  measurable,  quantitative  character¬ 
istic  of  software  units  and/or  computing  configurations  or  devices. 
There  are  three  fundamental  properties;  software  work,  execution 
time,  and  storage  occupancy.  These  and  only  these  properties  (or 
their  equivalents,  energy,  time,  and  available  storage)  are  used 
in  software  physics.  Certain  important  other  properties  are 

derived  using  the  fundamental  properties.  These  are  called 
derived  properties,  and  include  power,  storage  capacity  usage, 
intensity,  force,  and  distance. 

TO  obtain  an  actual  measurement  of  some  property,  both  a  unit  of 
software  and  the  computing  equipment  must  be  specified.  These 
are  called  software  physics  systems,  and  the  property  is  a 
characteristic  of  the  systems  being  measured. 

See  the  corresponding  glossary  entries  for  more  discussion. 


SOFTWARE  UNITS  AND  WORKLOADS 

A  software  unit  is  a  basic  system  of  interest  in  software  physics. 

It  is  formally  defined  as  an  arbitrary  collection  of  executable 
(object)  code  and  its  associated  operands.  A  software  unit  there¬ 
fore,  corresponds  to  a  program  with  its  associated  data,  an  appli¬ 
cation  and  data,  the  full  workload  and  data,  and  even  a  single 
instruction  and  its  operands.  Since  it  is  defined  in  such  a 
general  way,  software  physics  theory  requires  that  any  statement 
made  about  a  general  software  unit  be  true  for  all  software  units. 
Also,  since  a  software  unit  can  be  a  single  executable  instruction 
and  its  operands,  this  requires  that  only  those  quantitative 
properties  displayed  by  such  a  software  unit  can  be  associated 
with  all  software  units. 

A  workload  is  a  special  software  unit  only  in  the  sense  that  it 
represents  the  total  collection  of  executable  code  and  data  over 
some  period  of  time.  Because  of  this  however,  certain  statements 
and  equations  true  for  the  total  workload  may  not  be  true  for  all 
software  units.  The  reverse  of  course  is  true;  i.e.,  any  state¬ 
ment  about  a  software  unit  holds  for  workloads  as  well. 

TIME 

Time  is  a  basic  property  of  software  physics.  Its  unit  of  measure 
is  seconds,  as  measured  by  a  standard  clock.  Time  is  a  measure  of 
state  change  processes,  and  a  standard  wall  clock  is  assumed  to 
be  measuring  universal  state  changes.  In  software  physics,  the 
basic  time  quantity  of  interest  is  called  execution  time,  symbolised 
as  Tx.  For  a  given  software  physics  system,  Tx  is  increased  if 
and  only  if  an  instruction  is  being  executed  by  some  processor. 

If  this  condition  is  not  true,  then  even  though  the  wall  clock 
time  may  increase,  the  corresponding  execution  time  increase  will 
be  equal  to  zero  time. 

For  exasqple,  a  unit  of  software  5  may  be  in  execution  on  a 
configuration  1J1  from  tj  to  Its  execution  time  during  this 
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period  is  Tx^(S,ty)  ■  At  time  t^,  the  software  unit  is 

capable  of  being  executed  (is  dispatehable) ,  but  is  "involuntarily" 
caused  to  wait  until  tine  t^.  The  execution  tine  during  this 
period  t ^  is  equal  to  zero;  i.e.,  *  0.  Fron  t ^  to 

S  is  again  in  execution;  Tx^(Sa\i>)  *  If  S  is  now  completed, 

the  total  execution  time  would  be  Tx^(S,  ty)  +  Tx^ ( S} ’■}>)  +  Tx^ ( Sa  <pj  * 

'VV  * 0  *  'VV- 

A  second  form  of  tine,  called  elapsed  time  and  symbolized  as  Te 
is  also  of  interest.  Elapsed  time  is  not  an  independent  quantity, 
as  it  is  defined  in  terms  of  execution  time.  Formally,  the  basic 
definition  is  that  Te (L, ty)  =  Tx(L3ipJ.  That  is  the  software 
physics  elapsed  time  excludes  any  pure  idle  time;  i.e.,  Te  is  a 
measure  of  state  change  processes  which  occur  within  the  entire 
configuration.  If  no  instruction  (cpu  or  I/O)  is  occurring,  then 
the  change  in  Te  is  zero  even  though  the  wall  dock  time  is  being 
incremented.  The  difference  between  wall  clock  time  and  elapsed 
time  is  called  idle  time. 

Elapsed  time  measures  occupancy  of  storage,  execution  time 
measures  instruction  execution  time  within  the  configuration, 
subconfiguration,  or  equipment  class  of  interest.  The  elapsed 
time  of  a  given  unit  of  software  5  is  measured  by  the  changes  in 
the  quantity  Te(Ljip)  from  the  point  in  wall  clock  time  that  5 
occupies  storage  and  is  capable  of  being  executed  until  it  has  been 
completely  executed  and  no  longer  occupies  storage.  This  quantity 
would  be  symbolized  as  Te (Sa fy) .  By  definition ,  it  will  always  be 
true  that  Te(S,ty)  >_  Tx(Stty). 

Pure  idle  time  is  not  used  directly  in  software  physics  equations, 
except  that  it  represents  power  that  could  have  been  delivered  and 
was  not.  As  such,  when  determining  the  capacity  remaining  on  a 
configuration,  pure  idle  must  be  considered.  Since  pure  idle 
time  is  equal  to  wall  clock  time  minus  execution  time,  the 
remaining  capacity  is  always  a  function  of  "scheduled-on  time" 
for  computing  system. 
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WORK,  SOFTWARE 

Software  work  is  one  of  the  basic  properties  of  software  physics. 

In  general,  work  is  performed  when  a  change  in  state  occurs.  In 
software  physics,  a  processor  executing  an  instruction  will 
perform  work  on  a  storage  device  when  the  processor  causes  a 
symbol  state  change  to  occur.  The  standard  symbol  size  is  defined 
as  an  eight  bit  byte,  resulting  in  the  following  formal  definition: 

A  processor  performs  one  unit  of  software  work  (called 
a  "work”,  (symbol  w)  on  a  storage  device  when  it  changes 
the  symbol  state  of  one  byte  of  storage. 

The  instrumentation  problem  of  observing  if  a  transfer  of  one  byte 
to  storage  actually  causes  a  symbol  state  change  results  in  the 
following  operational  definition  of  software  work: 

A  processor  performs  one  unit  of  work  on  a  storage 
device  when  it  transfers  one  byte  to  that  storage  device. 

Software  work  is  measured  in  units  of  "work"  or  "works",  symbolized 
by  a  lower  case  w.  Normal  metric  prefixes  are  used  for  larger 
quantities;  i.e., 

1,000  works  »  1  kilowork  »  1  Jew 
1,000,000  works  *  1  megawork  *  1  mw 
1,000,000,000  works  ■  1  gigawork  -  1  gw 
1,000  Sew  ■  1  mw 
1,000  mw  ■  1  gw 

Software  work  has  the  property  that  the  whole  is  simply  equal  to 

the  sum  of  its  parts.  For  exaaple,  if  the  epu  work  of  some 

software  unit  5 ^  is  W(Sj,opu)  and  of  some  software  unit  5^  is 

W(S  tcpu) ,  then  the  epu  work  performed  by  both  is  simply 
2 

W(Sj,apu)  +  WfSpCpu) .  This  is  true  whether  Sj  and  are 
executed  concurrently  or  sequentially. 

It  should  be  explicitly  noted  that  a  software  unit  is  a  collection 
of  executable  code  and  data:  the  same  executable  code  over 


different  date  is  formally  a  different  unit  of  software.  Therefore , 
software  physics  does  not  imply  that  two  different  runs  of  the  same 
program  over  different  data  will  result  in  the  same  quantity  of 
software  work.  In  fact,  since  the  term  "data"  in  software  physics 
includes  the  sequence  in  which  operands  are  presented,  different 
sequences  of  the  same  operands  need  not  result  in  the  same  quanti¬ 
ties  of  software  work. 

Software  work  may  be  measured  directly  with  a  hardware  monitor  or 
software  monitor  in  many  instances.  However,  two  standard 
approximation  equations  are  generally  used.  These  are: 

1)  work  -  (number  of  instructions  executed)  *  (av.  work/instruction) 

2)  work  «  (average  power)  *  (seconds  of  execution  time) 

The  first  approximation  equation  is  most  often  used  when  the  number 
of  I/O  read/write  actions  or  instructions  are  known,  and  also  the 
average  block  size  re* d  or  written.  The  equation  thus  becomes: 

I/O  work  *  { »2/0  reads/writes) (average  block  size) 

The  quantity  ”#EXC?V”  is  given  by  the  IBM  instrumentation  software 
known  as  SMF.  img  this  as  an  approximation  to  the  number  of  I/O 
reads  and  writes,  one  has 

I/O  work  »  (#EXCP ' s) (average  block  size) 

The  second  equation  is  used  when  cpu  seconds  (of  execution  time) 
is  known.  A  hardware  monitor  is  used  to  establish  the  average 
cpu  power,  and  the  approximation  equation  then  becomes: 

cpu  work  ■  (av.  cpu  power) (no.  of  cpu  seconds) 


